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Bivariate  exponential  (BVE)  models  have  been  widely  used  in  the  analysis  of 
competing  risks  data  involving  two  risk  components.  For  such  analysis,  frequentist 
approach  often  runs  into  difficulty  due  to  nonidentifiable  likelihood.  With  an  end  to 
overcome  the  nonindentifiability,  recent  literature  has  been  geared  towards  Bayesian 
analysis  with  informative  priors.  However,  systematic  prior  elicitation  is  often  diffi- 
cult. This  study  focuses  instead  on  Bayesian  analysis  with  noninformative  priors. 

Due  to  the  nature  of  the  competing  risks  data  and  BVE  model  structure,  quite 
often  it  leads  a likelihood  function  with  a nonregular  Fisher  information  matrix  im- 
peding thereby  the  calculation  of  standard  noninformative  priors  such  as  Jeffreys’ 
priors  and  their  variants.  As  a remedy,  a stagewise  noninformative  prior  elicitation 
strategy  is  proposed.  A variety  of  noninformative  priors  are  developed,  and  are  used 
for  data  analysis.  In  addition,  the  frequentist  probability  matching  criteria  are  inves- 
tigated among  the  newly  developed  noninformative  priors. 

Often  due  to  resource  limitation  or  other  economic  and  practical  reasons,  individ- 
uals are  only  periodically  screened.  The  resulting  competing  risks  data  are  necessarily 
discrete.  A class  of  flexible  models  is  introduced  to  handle  such  discretized  data. 


IX 


CHAPTER  1 
LITERATURE  REVIEW 

1 . 1 Introduction 


Often  in  a life-testing  situation,  failure  of  an  individual  can  be  identified  as  one 
or  more  of  s (s  > 1)  mutually  exclusive,  but  possibly  dependent  causes  of  failure. 
In  other  words,  each  individual  is  subject  to  s distinct  risks  referred  to  as  competing 
risks  threatening  its  life.  Associated  with  cause  i,  there  is  a nonnegative  absolutely 
continuous  random  variable  representing  the  lifetime  of  an  individual  when  no 
other  potential  risks  are  present.  Suppose  that  the  termination  time  of  an  individual 
is  defined  as  the  time  to  the  first  failure.  Thus,  lifetime  of  an  individual  is  given  by  T = 
min{T!,  • • -Ts}.  The  available  information  is  usually  given  by  the  pair  (T, /),  where 
/ indicates  the  cause(s)  of  failure.  The  competing  risks  concept  can  appropriately 
be  applied  to  many  areas  of  study,  such  as  industrial  reliability  analysis,  market 
transaction  analysis,  and  clinical  trial  on  paired  organs. 

Our  main  goal  is  to  use  Bayesian  methodology  for  making  statistical  inference  in 
competing  risks  models  to  study  certain  lifetime  features  of  interest  and  the  covariate 
effects  on  the  underlying  survival  functions.  The  first  step  is  necessarily  multivariate 
lifetime  modeling.  In  the  one-dimensional  case,  the  Weibull  distribution  has  been 
considered  as  the  most  flexible  one  for  lifetime  modeling  since  it  accommodates  three 
major  types  of  failure  rates:  aging  type,  decaying  type,  and  constant  type.  A natural 
extension  to  multidimensional  lifetime  modeling  would  be  the  multivariate  Weibull 
model.  An  important  special  case  is  the  multivariate  exponential  distribution.  When 
the  shape  parameters  are  known,  then  one  can  make  a power  transformation  on 
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Weibull  lifetimes  to  bring  the  resulting  distributions  into  the  exponential  form.  Our 
primary  emphasis  will  be  on  the  two-dimensional  case.  As  is  well-known,  typical 
bivariate  exponential  distributions  are  generated  from  Poisson  processes.  Some  are 
designated  for  distinctive  characteristics  of  lifetimes,  such  as  memorylessness  and 
absolute  continuity  (Freund  (1961)  and  Sarkar  (1987)).  There  exists  an  inevitable 
tradeoff  between  these  properties  except  when  the  two  variables  are  independent 
(Block  and  Basu,  1974).  Also,  practical  considerations  have  often  led  to  bivariate 
exponential  distributions.  For  example,  Ryu’s  (1993)  shock  model  is  derived  purely 
from  practical  considerations. 

The  organization  of  this  chapter  is  as  follows.  In  Section  1.2,  we  introduce  the 
concept  and  applications  of  competing  risks.  Also,  one  will  be  reviewing  some  rel- 
evant work  in  this  area.  A detailed  discussion  of  bivariate  lifetime  models  will  be 
illustrated  through  examples  in  Section  1.3.  Formulation  of  noninformative  priors 
will  be  reviewed  in  Section  1.4.  In  Section  1.5,  our  objectives  for  studying  competing 
risks  models  with  Bayesian  methodology  are  proposed. 

1.2  Competing  Risks 
1.2.1  Concepts  and  Applications 

Consider  a study  of  an  in-processing  population.  Independent  individuals  of  the 
population  are  exposed  to  s mortality  sources  that  cause  termination  of  the  current 
process  (e.g.,  life,  presence  of  certain  feature),  namely,  failure.  Defined  in  general, 
the  term  “lifetime”  is  the  time  length  between  entry  of  the  study  and  the  time  of 
termination.  Lifetime  of  each  individual  hence  is  subject  to  the  s risk  factors  which 
are  referred  to  as  “competing  risks”.  This  concept  is  applied,  for  example,  in  a 
series  system  consisting  of  s components,  where  failure  of  any  one  of  the  components 
leads  to  failure  of  the  whole  system.  For  convenience,  we  shall  often  refer  to  the 
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s-component  series  system  as  an  illustrative  example.  As  an  example  of  animate 
application,  a clinical  trial  is  considered,  where  the  effect  time  of  a new  treatment  for 
a certain  disease  on  pairs  of  organs  measures  the  time  difference  between  the  cure  and 
the  time  to  the  first  infection  of  the  diagnosed  organ (s).  In  the  simplest  competing 
risks  setting,  coherence , independence,  and  mutual  exclusiveness  are  assumed  among 
the  possible  risks.  Coherence  is  our  major  assumption,  meaning  that  individuals 
experience  simultaneous  exposures  to  all  possible  risk  factors  throughout  the  study. 
Coherence  is  only  convenient  for  counting  lifetime.  The  mutual  exclusiveness  can  be 
achieved  by  refining  the  classification  of  the  risk  factors.  However,  the  independence 
assumption  may  not  always  be  appropriate.  If  this  is  the  case,  then  one  must  consider 
multivariate  lifetime  distributions.  Our  primary  interest  in  competing  risks  context 
are:  the  survival  or  failure  rate  in  the  presence  of  all  risk  factors  and  the  cause-specific 
probabilities.  To  this  end,  we  need  the  knowledge  of  the  distribution  of  (T,  I). 

1.2.2  Models  and  Likelihood  Functions 

Let  0,  L(.),  P(.),  and  p(.),  and  S(.)  denote  model  parameter  vector,  likelihood 
function,  probability  measure,  probability  density,  and  survival  probability  measure 
respectively.  The  joint  probability  density  function  of  (Ti,---,TS)  is  denoted  by 
f(h,  • • • , t8 \0),  and  the  joint  survival  function  f(yu  • • • , ys\G)  dyx,  • • • , dys  is 


identifies  the  cause  of  failure.  Suppose  that  (T,  I)  = (t,  i ) is  observed.  Then  the  joint 
density  p(t,  i\0)  is  given  by 


denoted  by  S(tu  • • • ,ts \G),  where  G = (9U-  ■ -9k).  Let  T = min{Ti,  • • • ,TS},  while  I 


Alternatively,  (1.1)  can  be  rewritten  as 


p(t,i\G) 


dS(tu  - ■ ■ ,ts\G) 

dti 


ti=t2  = ---=ta=t 
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Under  the  independence  of  Ti,  • • • ,TS,  (1.1)  simplifies  to 

jfr 

where  fa  and  Si  denote  the  marginal  pdf  and  survival  function  for  the  ith  component, 
respectively.  However,  this  assumption  of  independence  usually  is  not  practical. 

For  modeling  dependence,  Chiang  (1961)  suggested  the  constant  proportional  haz- 
ard rate  model.  Let  f(t\0)  and  5(f|0)  denote  the  marginal  pdf  and  survival  function 
of  T,  respectively.  The  model  assumes  that  the  proportions  of  component-specific  in- 
stantaneous failure  rates,  {r*(0)  = ri(t\Q)  = p(t,i\0)/S(t\9 ) : 1 < i < s},  are  constant 
through  time.  As  a mathematical  result,  the  likelihood  function  is  fully  determined 
by  f(t)  (or  S(t))  and  the  proportion  constants  {ri(0),  • • • , rs(0)},  i.e., 

p(t,i  i«>  = n[5(ii9)]r'<S).  (i.2> 

The  Weibull  (including  the  exponential  distribution  as  a special  case)  and  the  Gom- 
pertz  distribution  (equivalent  to  logarithm  of  a Weibull-truncated-at-1)  are  some  of 
the  distributions  that  have  been  widely  considered  for  one-dimensional  lifetime  mod- 
eling. 

Lagakos  and  Williams  (1977)  and  Kalbfleisch  and  Mackay  (1979)  proposed  the 
constant-sum  model  in  the  analysis  of  informative  censored  survival  data.  Under 
the  rationale  that  censoring  constitutes  an  additional  source  of  risk  by  preventing  the 
identification  of  any  future  failure,  this  development  is  also  appropriate  for  competing 
risks  modeling,  where  s — 2.  Let  7\  and  T2  denote  failure  time  and  censored  time, 
respectively.  Let  0 = (0U  02),  and  Nt  = (t,  t + r).  Two  quantities,  a(t\0)  — P(T2  G 
Nt,I  = 2| T2  G Nt,  0)  and  du(t\6)  = P(Ti  G Nt,I  = 1|T2  > t,9 ),  are  defined  such 
that  a(t\0)  -I-  u(t\0)  = 1.  With  this,  the  joint  pdf  p(t,  i\9)  is  given  by 


P(t,i \9)  -Kf|0)dF2(t|02)]1^][(l-F2(t|02))dw(t|0)]1[^] 

- [a{t\9)dF2{t\92)Y^  [(1  - F2(t\92))  d(  1 - a^G))]1^), 


(1.3) 
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where  d denotes  the  differentiation  operator  and  1 is  the  conventional  indicator  func- 
tion. 

Other  alternatives,  such  as  the  conditional  independence  technique,  have  also 
shown  their  strength  in  the  competing  risks  modeling.  Motivated  by  Vaupel  (1979), 
Clayton  and  Cuzick  (1985)  made  use  of  the  positive  random  quantity,  frailty,  to  con- 
struct models  for  correlated  lifetimes.  Given  the  unobserved  frailty,  the  true  hazard 
functions  act  multiplicatively  to  the  baseline  hazards  and  they  are  conditionally  in- 
dependent. Suppose  that  the  system  frailty  w is  characterized  by  the  pdf  7r(iu|?7). 
Let  {hi(t \9)  = — d\og[Si(t\0)]/dt  : 1 < i < s}  be  the  marginal  hazard  rates  and 
{Hi(t\&)  = — log[S'j(t|0)]  : 1 < i < s}  be  the  corresponding  cumulative  hazard  rates. 
Then 


r -J2  Hi{t\6)w 

p(t,i\9,rf)  = hi(t\9)we  <=1  T:(w\r])dw.  (1.4) 

In  order  to  express  the  integral  explicitly,  one  needs  to  specify  the  pdf  7r(w\rj)  in  ad- 
dition to  the  underlying  lifetime  distribution.  Gamma  or  positive  stable  distribution 
are  some  of  the  most  commonly  used  choices. 

From  now  on,  we  shall  be  considering  exclusively  the  case  s = 2.  In  the  following 
section,  the  bivariate  Weibull  family  will  be  reviewed.  Naturally,  its  scope  extends 
well  beyond  the  analysis  of  competing  risks  data. 

1.3  Bivariate  Lifetime  Distributions 


The  parametric  modeling  for  a one-dimensional  survival  function  5(^0)  is  usu- 
ally determined  by  its  instantaneous  hazard  rate  h(t\9),  or  by  the  cumulative  hazard 

function  H(t\9)  since  S(t\9)  - e~Hd\0)  = e~f0  Qne  famiiiar  example  is  the 

constant  hazard  rate  which  gives  rise  to  an  exponential  lifetime  model.  Its  generaliza- 
tion to  the  two  dimensional  cases  may  be  based  on  several  considerations:  absolute 
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continuity  of  the  joint  distribution,  exponentiality  of  the  marginals,  exponentiality  of 
the  minimum,  or  positive  (or  negative)  correlation  between  Tj  and  T2. 

Due  to  its  flexibility,  the  bivariate  Weibull  (BVW)  family  has  been  widely  applied 
for  bivariate  lifetime  modeling.  We  especially  incorporate  it  with  the  competing  risks 
context.  A bivariate  distribution  function  is  often  referred  to  as  a BVW  if  it  meets  at 
least  one  of  the  following  conditions:  (l.a)  it  has  Weibull  distributions  marginally,  e.g., 
Marshall-Olkin  (1967)  BVW,  and  (l.b)  the  minimal  lifetime  has  Weibull  marginal, 
e.g.,  Lu  and  Bhattacharyya’s  (1990)  BVW. 

We  may  observe  that  bivariate  exponential  distribution  (BVE)  is  a special  case  of 
a BVW.  Essentially,  the  BVE  family  constitutes  the  foundation  of  BVW.  In  the  next 
subsection,  we  focus  attention  exclusively  to  the  BVE  subfamily. 

1.3.1  Bivariate  Exponential  Distributions 

A BVE  can  be  generalized  in  several  ways.  Below,  we  discuss  certain  properties 
which  are  useful  for  such  generalization. 

1.3. 1.1  Loss-of- Memory  Property  and  Absolute  Continuity 

The  very  unique  characterization  for  the  exponential  distribution  is  the  loss-of- 
memory  property  (LMP).  The  interpretation  for  LMP  is  that  given  the  current  sur- 
vival status,  the  joint  distribution  of  the  residual  lifetimes  remains  the  same  as  the 
unconditional  joint  lifetime  distribution.  For  the  univariate  case,  under  assumed 
continuity  of  the  distribution,  this  leads  to  an  exponential  distribution.  In  the  multi- 
variate case,  this  property  is  sometimes  cited  in  a more  specific  way.  Suppose  that  the 
property  holds  marginally  (i.e.  Tj,  T2,  or  T is  marginally  exponentially  distributed). 
We  refer  to  it  as  the  marginal  loss-of-memory  property  (MLMP).  The  term  used  to 
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describe  LMP  for  a two-dimensional  random  vector,  say  (TX,T2),  is  the  joint  loss-of- 
memory  property  (JLMP).  It  is  defined  by 

S(tl+y,t2  + y\0)  = S(ti,t2\0)  S(y,y\0),  (1.5) 

V t\  > 0,  t2  > 0,  and  y > 0. 

The  family  of  bivariate  distributions  satisfying  (1.5)  has  been  characterized  by  Mar- 
shall and  Olkin  (1967).  Let  Sx(t |.)  = P(T 1 > t\.)  and  S2(t\.)  = P(T2  > t\.).  The 
survival  function  is  in  the  form 


S(ti,t2\X,0i,G2)  — e At3  3 Sj(tj  — i3-il^j)l[o<t3_j<^]>  (1-6) 

This  implies  that  (2. a)  The  minimal  lifetime  T is  independent  of  the  residual  lifetime 
|Ti  — T2\  and  the  indicator  1[t1<t2];  and  (2.b)  T is  exponentially  distributed. 

One  immediate  example  is  the  well-cited  Marshall-Olkin  (1967)  BVE  model  which 
satisfies  both  MLMP  and  JLMP.  However,  as  shown  by  Block  and  Basu  (1974), 
without  the  independence  between  Tx  and  T2,  absolutely  continuity  (i.e.,  joint  pdf  of 
(Ti,T2)  exists  and  P(Tj  = T2)  = 0),  MLMP  and  JLMP  cannot  exist  simultaneously. 
More  exactly,  under  dependence,  if  any  of  two  among  MLMP,  JLMP,  and  absolute 
continuity  are  satisfied,  the  joint  density  must  be  lack  of  the  other  property.  Marshall- 
Olkin’s  BVE  retains  the  dependence  instead.  In  contrast,  the  following  BVE  models 
ensure  the  absolute  continuity  by  giving  up  either  MLMP  or  JLMP. 

Freund’s  (1961)  absolute  continuous  bivariate  exponential  model  (ACBVE)  was 
originally  designated  for  the  parallel  system.  Let  the  model  be  characterized  by 
parameters  (a,  a , (3,  /?'),  where  0<a<  a <a  + (3,  0</3</3'<a  + /3.  It 
assumes  that  Tj  and  T2  follow  independent  exponential  distributions  with  constant 
failure  rates,  a and  /?,  respectively,  when  the  system  is  under  normal  process.  The 
dependence  initiates  when  any  one  of  the  two  components  collapses.  More  specifically, 
the  failure  rate  of  component  1 increases  to  ol  right  after  component  2 fails.  Similarly, 
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failure  rate  of  component  2 increases  by  (01  - 0 ) while  component  1 fails.  This 
phenomenon  exhibits  in  the  joint  probability  density 


f(ti,t2\a,a',p,  0') 


apf  0 < ti  < t2 

0a  e-(P+<*-<*')t2-ah  0 < <2  < h 


(1.7) 


Rephrasing  hazard  rate  by  workload  (in  unit  time),  Freund’s  BVE  explains  the  system 
with  constant  workload  (a  + /3)  , in  which  the  load  is  dynamically  transferred  between 
two  components.  When  one  of  the  components  fails,  the  load  for  this  component  is 
reduced  (by  the  amount  of  (0'  — 0)  for  component  1 and  ( a ' — a)  for  component  2). 
The  excess  load  (01  — 0)  (or  (a'  — a))  then  shifts  to  the  partner  component.  Then 

S(t1,t2\a,a',0,0,)=e^a+^  Sjfo  - t3_j\a,  a',  0,  /?,)l[o<t3-J<td’  (L8) 

where 


Si(t\a,a',p,p') 

S2(t\a,a',0,0') 


ft  e-dt  + (a  a ) Q-(a+0)t 

a + 0 — oi.  a + 0 — a 

a e-0't  , (ft~ft)  e-(a+/»)t 

a + 0-0'  a + 0-0' 


Apparently,  JLMP  is  confirmed  by  (1.5).  The  marginals  of  Tj  and  T2  are  no  longer 
exponentially  distributed  as  expected.  They  are  mixtures  of  two  exponentials. 

To  achieve  JLMP  and  nonsingularity,  Block  and  Basu  (1974)  modified  Marshall- 
Olkin’s  BVE  by  giving  up  MLMP.  Their  ACBVE  has  the  survival  probability  given 
by 

S(tu  f2|A1}  A2,  A12)  = e-(Al+A2+Al2)t3^  Sjitj  - t3-j)l[o (1.9) 


where 


5i(t|Ai,  A2,  Ai2) 
52(t|Ai,  A2,  Ai2) 


A!  + A2  + Al2  (Ai+AiaU 
Ai  + A2 


p— (Ai+A2+Ai2)t 

A,  + A2 


Ai  + A2  + A 


13  p— (A2+A12  )t 


A 


12 


0 — (Al+A2+Ai2)£ 


Ai  + A2 


Ai  + A2 
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It  indeed  is  a restricted  version  of  Freund’s  ACBVE.  The  constraints  of  parameters 
are:  a = Ai  + Ai^A2  A12,  /?  = A2  + A;^Aa  A12,  ol  = Ai  + A12,  and  (3  — A2  + A12. 

Physically,  the  model  assumes  that  the  proportions  of  excess  workloads  for  both 

components  are  equal,  i.e.,  QL^SL  = Moreover,  by  expressing  the  survival  function 
of  the  Marshall-Olkin  BVE  in  Friday’s  (1977)  form,  one  finds  that  the  Block-Basu 
ACBVE  is  the  absolutely  continuous  component  in  the  expression. 

Sarkar’s  (1987)  version  of  ACBVE  secures  MLMP  and  absolute  continuity.  This, 
however,  is  no  longer  JLMP,  nor  the  model  has  a clearly  physical  interpretation. 
Sarkar’s  joint  survival  function  is  given  by 

S(<1,t2|A1,A2,A1J)  = - [1  - — e-AiqI+r}l[o<t,<tw], 

where  r = , . 

A1+A2 

1.3. 1.2  Shock  Models 


The  idea  of  “shocks”  was  developed  based  on  the  relationship  between  Poisson 
process  and  the  exponential  distribution  (cf.  Marshall  and  Olkin  (1967)  and  Arnold 
(1975)).  Imagine  that  the  occurrence  of  each  mortality  force  as  random  shocks. 
Shocks  occur  in  time  according  to  Poisson  process.  A failure  is  induced  when  the 
accumulated  damage  caused  by  shocks  exceeds  the  threshold.  Generally,  two  types  of 
shocks  are  considered:  fatal  (noncumulative)  shocks  and  cumulative  shocks.  These  are 
distinguished  by  recovered  (renewal)  probability,  denoted  by  r(t\j)  which  measures 
the  survival  probability  at  time  t given  experience  of  j shocks.  Let  N(t)  denote  the 
stochastic  process  that  counts  the  occurrences  of  shocks.  Thus,  r(t\j)  = P(T  > 
t\N(t—)  = j),  for  0 < t < 00,  0 < j < 00.  For  fatal  shocks,  r(t\j)  — 0,  for 
0 < t < 00,  1 < j < 00.  This  implies  that  N(t)  achieves  its  maximum  value  at 
1.  On  the  other  hand,  r(t\j)  > 0,  for  0 < t < 00,  1 < j < 00,  if  the  shocks  are 
cumulative.  The  joint  survival  function  for  a shock  model  is  determined  by  recovered 
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probabilities  and  the  corresponding  stochastic  processes  (or  the  distribution  of  time 
interval  of  successive  shocks).  We  shall  now  make  this  concept  more  concrete  through 
the  following  examples. 

The  derivation  of  the  Marshall-Olkin  (1967)  BVE  follows  the  fatal  shock  princi- 
ple. Consider  in  the  two-component  series  system,  components  are  damaged  by  “fatal 
shocks”  from  different  sources.  Let  Ny(t),  N2(t),  and  Nn (t)  respectively  character- 
ize the  occurrence  of  shocks  experienced  by  component  1 only,  component  2 only, 
and  both  jointly.  Suppose  that  Ni(t),  N2(t),  and  N12(t)  are  independent  Poisson 
processes  with  respective  intensities  Ai,  A2,  and  A12.  The  term,  intensity,  stands 
for  the  frequency  of  occurrences  of  the  undergoing  stochastic  process,  or  is  equiva- 
lent to  the  hazard  rate  in  the  corresponding  survival  function.  Denote  the  renewal 
probabilities  associated  with  Ny,  N2,  and  Ny2  by  ry,  r2,  and  ri2,  respectively.  Since 
ri(t\j)  = r2(t\j)  = r\2(t\j)  = 0 for  j > 1,  the  joint  survival  function  S(ti,t2\.)  is  found 
from 


5(fi,  f2|Ai,  A2,  Ai2) 

= P(T1>tl,T2>t2\X1,X2,Xl2) 

— P (Ny(ti)  = 0,  N2(t2 ) = 0,  A^i2(max{fi,  t2})  = 0 | Ai,  A2,  Ai2)  . 

Under  the  assumption  of  independence  among  Ny,  N2,  and  Ny 2,  one  obtains 

S(ty,t2 |Ai,  A2,  Ai2)  = exp{— AiU  - A 2t2  - Ai2  max(U,  t2)}.  (1.10) 

Thus,  MLMP  can  be  quickly  identified.  Rewrite  (1.10)  in  the  form  of  (1.6),  then 

S(ty,  t2\Xy,  A2,  A12)  = e-(Al+A2+Al2)‘3->  e-(A2+Al2)(b-t2-Pl[0<t3_.<t.].  (1.11) 

One  can  further  validate  its  JLMP. 

“Cumulative  shocks”  model  is  exemplified  by  Downton’s  (1970)  BVE.  This  shock 
model  comprises  of  two  independent  Poisson  shocks  Ny(t)  and  N2(t),  of  which  the 
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joint  distribution  of  the  occurrence  of  shocks  follows  a bivariate  geometric  structure1 
such  that  ri(t\j)  — r2(t\j)  = 77,  for  0 < t < oo,  0 < j < oo.  Under  the  constant 
recovering  probability  assumption,  exponential  marginals  are  acquired.  The  derived 
joint  lifetime  density  is  given  by 


1 

(1  - 77)AiA; 


exp 


AiU  + X2t2 

(1  — v) 


B0 


f ‘^{,nX\X2t\t2)  2 

V 


(1.12) 


where  Bn  is  the  modified  Bessel  function2  of  the  first  kind  of  order  n. 

The  “weighted  shocks  model”  is  attributed  to  Ryu  (1993).  This  development 
features  a class  of  bivariate  lifetime  models  with  marginally  progressing  hazard  rates. 
Let  Ni(t),  N2(t),  and  Ni2(t)  be  defined  as  earlier.  In  the  generalized  version,  Ryu 
modeled  the  hazard  rates  as  weighted  sum  of  cumulative  shocks,  say  diNi(t)  + 
SiNi2(t)  for  component  1 and  d2N2(t)  + s2Ni2(t)  for  component  2.  It  is  the  weights 
that  give  the  flexibility  for  modeling.  These  weights  can  be  analogous  to  the  recovery 
probabilities,  i.e.,  if  the  weight  is  finite,  the  associated  shock  is  cumulative,  otherwise 
it  is  noncumulative.  In  general,  the  survival  function  is  given  by 


5(^1,  f2|Ai,  A2,  A12) 

= exp{— (Aj  + Ai 2)tj  - A jtz-j  + ^-[1  - e~djtj]  + ^-[1  - e-*-'*3-*]} 

dj  d3-j 

x exp{  — [1  - e-sTb-«3-d]  + ....  Al2  _ e-*s-J*3-j 

Sj  + s2 ) 


Thus,  there  is  no  indication  of  JLMP  from  this  probability  presentation.  Neverthe- 
less, it  enjoys  both  JLMP  and  MLMP  to  a certain  degree  of  approximation.  More 
specifically,  the  Marshall-Olkin’s  BYE  is  found  at  the  boundary  of  this  class,  where 


Si  — s2  — d\  — d2  — oo. 


1 It  implies  that  marginally,  the  total  number  of  shocks  that  bring  failure  follows  the  geometric 

distribution. 


2Bn{x) 


- V IzB. 


E 

r— 0 


!(n+r)! 
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1.3.1. 3 Gumbel’s  Models 


Gumbel  (1960)  proposed  two  types  of  BVE  models  purely  from  the  considerations 
of  exponential  marginals  and  certain  relationship  between  the  two  variates.  Thus, 
only  MLMP  is  clear  in  this  motivation.  We  shall  name  these  two  BVE’s  as  Gumbel- 
A BVE  and  Gumbel-B  BVE.  The  respective  joint  survival  functions  are  given  by 


SGA(ti,t  2)  = e~Xltl-X2t2~Xl2tlt2 , 

(1.13) 

SGB(ti,t2)  = e~KAltl)1/r+(A2t2)1/rir. 

(1.14) 

It  turns  out  that  both  Gumbel  BVE’s  are  ACBVE  with  MLMP.  The  Gumbel-B  BVE 
satisfies  (2. a),  yet  it  does  not  possess  the  JLMP. 

1.3.2  Bivariate  Weibull  Distributions 

A typical  technique  used  for  constructing  a bivariate  Weibull  distribution  (BVW) 
is  to  make  power  transformation  of  marginals  of  BVE.  Other  alternatives,  for  instance, 
random  hazard  modeling  and  function-specific  modeling  are  also  commonly  used. 
These  BVW’s  will  be  introduced  next. 

1.3. 2.1  Power  Transformation 

Lee  (1979)  made  the  transformation  from  the  Gumbel-B  BVE.  The  Marshall-Olkin 
BVE  was  generalized  to  a BVW  with  joint  survival  function 

S(ti,t2\\i,  X2,  A12,ri,r2)  = expj-Ai^1  - X2t22  - Xi2  max(tj1,  tr22)}.  (1.15) 


By  restricting  n — r2  — r in  (1.15),  Lee  and  Thompson’s  (1974)  BVW  is  obtained. 
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1 .3.2.2  Random  Hazard  Models 


BVW  can  also  be  generated  from  random  hazard  models  subject  to  Weibull 
marginals.  The  underlying  hazards  are  adjusted  multiplicatively  by  the  random  envi- 
ronmental hazard  w.  Essentially,  this  modeling  strategy  is  similar  to  that  of  the  frailty 
model.  In  Vaupel’s  (1979)  frailty  model,  two  lifetimes  are  conditionally  independent 
of  each  other.  The  random  hazard  model  instead  allows  conditional  dependence. 
Let  H\  and  H2  be  the  respective  cumulative  hazard  functions  for  Xj  and  X2,  and  r 
(0  < r < 1)  be  the  shape  parameter.  Given  the  random  quantity  w,  the  conditional 
survival  function  has  the  form 

S(ti,t2\w,r,  9)  = exp{-{[tf1(t:|0)]1/r  + [ H2(t2\G)]1/r]}rw } . 

Let  7r(w)  be  the  pdf  for  w.  Then  the  marginal  survival  function,  or  the  Laplace 
transformation  of  w evaluated  at  {[i/i(t1|0)]1/r  + [H2(t2\G)]1/r]}r , is  given  by 

S(ti,t2\r,  0)  = j exp  {— {[//i(t1|0)]1/r  + [H2(t2\G)]1/r]}rw}  n(w)  dw. 

First  assume  that  w ~ G(a,  l)3  and  Weibull  marginals  for  Tx  ~ W(0i,Ai)4  and 
T2  ~ W(92,  A2),  the  joint  survival  function  is  derived  as 

S(tut2)  = {1  + [ (e(Altl)Sl  - l)x/r  + (e^62  - l)l!r  Y}~a.  (1.16) 

This  survival  function  was  considered  by  Lu  and  Bhattacharyya  (1990).  Oakes  (1982) 
has  discussions  for  the  special  case  when  r = 1 . 

Next,  assume  that  w follows  a positive  stable  density  with  a scale  parameter  a, 
i.e.,  E[exp(—wy)\  = exp  (—ay).  Marginal  distributions  of  T\  and  T2  remain  the  same 
as  above.  The  joint  survival  function  becomes 

S(tut2)  = exp{— [(A1t1)6'1/or  + (\2h)92/ar]ar},  (1-17) 


3G(a,b)  denotes  gamma  distribution  with  shape  parameter  a and  scale  parameter  b,  ( a,b  > 0). 

4W (a,  b)  denotes  Weibull  distribution  with  shape  parameter  a and  scale  parameter  b,  ( a , b > 0). 


14 


provided  that  ( a,r ) 6 (0,1]  x (0,1].  (1.17)  can  also  be  obtained  by  making  power 
transformation  from  the  Gumbel-B  ACBVE  model  (see  (1.14)).  Hougaard’s  (1986) 
derivation  is  a special  case  of  (1.17),  where  r — 1. 

1 .3.2.3  Function-Specified  Models 

Lu  and  Bhattacharyya  (1990)  generalized  two  classes  of  absolute  continuous  BVW 
models,  of  which  the  joint  survival  functions  are  given  by  the  expression 

S(t1,t2)  = exp{— (Aifi)01  - (A 2t2)°2  - dog(ti,t2)},  (1.18) 

where  g(ti,t2)  is  differentiable  and  could  depend  on  other  parameters. 

Case  1:  They  first  considered  90  > 0 and  g(t\,t2)  = [(A iti)01/™  + (X2t2)02^m]m  (0  < 
m < 1)  in  (1.18).  Then 

S(h,t2)  = exp{-(A1f1)01  - (X2t2)02  - 0o[(Aiti)*l/m  + (X2t2)0>'m]m}.  (1.19) 

Case  2:  Before  introducing  case  2,  two  dependent  structures  are  defined.  Random 
variables  X and  Y are  said  to  be  positive  quadrant  dependence  PQD  (Lehman,  1966) 
if  for  all  (x,  y ), 

P(X  > x,Y  > y)  > P(X  > x)P(Y  > y).  (1.20) 

Negative  quadrant  dependence  (NQD)  is  correspondingly  defined  by  reversing  the 
inequality  in  (1.19).  Most  of  the  BVW’s  with  survival  function  in  the  form  of 
exp {k(x,y)}  are  either  strictly  PQD  or  NQD.  For  example,  the  Gumbel-A  ACBVE 
is  strictly  NQD,  while  the  BVW  given  in  (1.17)  is  strictly  PQD.  A bivariate  lifetime 
model  that  admits  both  PQD  and  NQD  is  pursued  with  the  joint  survival  function 

S(tut2)  = Si(ti)S2(t2)  e0°f1-,SlOi)}{1-s,2(t2)}  (L21) 

A NQD  survival  function  is  obtained  under  — 1 < 90  < 0,  while  if  90  > 0,  it  is  a PQD 
survival  function.  Particularly,  let  Si,  S2  be  the  Weibull  survival  functions.  A BVW 
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is  then  constructed  with  joint  survival  function 

5(ti,i2|A1,  A2, 0o,0i,02)  = exp{-(A1t1)ei  - (\2h)e2+90[l  - e-^^Hl  - e-^’2]}. 

1.3.3  Use  of  Concomitant  Information 


To  account  for  the  presence  of  concomitant  information  in  lifetime  analysis,  we 
briefly  review  some  typical  strategies  considered  in  the  past.  In  the  classic  regression 
modeling  for  one-dimensional  response,  the  expectation  of  the  response  variable  con- 
ditional on  knowledge  of  the  covariates  (say  vector  z)  is  assumed  to  be  some  function 
(usually  linear)  of  the  covariates.  Feigl  and  Zelen  (1965)  and  Zippin  and  Armitage 
(1966)  recommended  a similar  adjustment  for  the  mean  lifetime  by  equating 

E(T\z)  = z'(3. 

However,  since  lifetime  is  always  positive,  the  range  of  (3  is  restricted.  As  a remedy, 
Cox  (1972)  proposed  a general  proportional  hazard  (or  accelerated  lifetime)  model  via 
adjusting  baseline  hazard  rate  with  a positive  multiplicator,  a function  of  covariates, 
say  g(z,(3).  The  pdf  of  the  survival  time  is  taken  as 

p(t\z)  = ho(t)g( z,  (3)  exp  {-H0(t)g( z,  (3)}. 

With  g(z,  f3)  = ez'@,  the  range  of  (3  is  now  free  from  the  covariates  z.  Klein  and  Basu 
(1982)  adopted  the  proportional  hazard  structure  for  independent  Weibull  variates. 

The  contemporary  GLM  technique  will  be  described  in  detailed  in  Chapter  3.  For 
example,  Wada,  Sen,  and  Simakura  (1996)  employed  Sarkar’s  ACBVE  model,  and 
adjusted  parameters  with  covariates  z via  logit  regression  model  and  proportional 
hazard  model.  These  models  are  given  by 


Ai(z)/A2(z)  = exp(z'7),  A(z)  = Ai(z)  + A2(z)  + A12(z)  = exp(z  ’(3).  (1.22) 
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1.3.4  Statistical  Inference 

Inference  for  competing  risks  models  has  mostly  been  classical.  The  maximum 
likelihood  estimators  (MLE)  and  the  corresponding  estimated  asymptotic  variance- 
covariance  matrices  were  often  used  for  the  construction  of  tests  and  confidence  in- 
tervals. The  estimated  asymptotic  variance-covariance  matrix  is  the  inverse  of  the 
Fisher  information  matrix  evaluated  at  the  MLE’s  of  the  parameters.  However,  this 
approach  demands  the  use  of  profile  likelihood  or  the  integrated  likelihood  when 
the  full  likelihood  is  not  fully  identifiable.  This  approach  can  be  challenged  by  the 
non-uniqueness  of  MLE’s  and  the  difficulty  of  asymptotics.  We  illustrate  below  for 
some  simple  competing  risks  models  to  see  how  such  estimators  are  obtained.  Let 
Tj  = and  ( tj,ij ) denotes  the  realization  of  (Tj,Ij),  for  1 < j < n. 


n\2  = 0 under  the  absolutely  continuous  assumption. 

Example  1 We  begin  the  first  example  with  the  Marshall-Olkin  BVE  model  given  in 
(1.10).  The  joint  likelihood  function  is  given  by 


n 


n 


n 


£ 1[q,2<q,i])  and  ni2  = £ 1[q,1=q,2]-  Notice  that 


n 


T(Ai,  A2,  A12)  oc  Aini  \2n2  A]^"12  exp{— A ^ tj}, 


(1.23) 


where  A = Ai  + A2  + Ax2.  The  MLE’s  are 


n 


n 


n 


Ai  = niQT*j)  \ \2  = n2{Ylti) 


\ A12  — ni2(^ij)  l. 


The  estimated  asymptotic  variance-covariance  matrix  follows 


Ai  0 0 


I 1(Ai,  A2,  A12)  — 0 A2  0 

0 0 A12 
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In  this  example,  MLE’s  and  their  asymptotic  behaviors  can  be  easily  verified. 


Example  2 Consider  the  Gumbel-B  ACBVE  model  given  in  (1.14).  The  joint  likeli- 
hood function  is  given  by 

L{ A,  p,  r)  oc  A"  pnx  (1  - p)n2  exp{-A  ^ tj}l[o<r<i],  (1-24) 

3= 1 


where  A = (Aj^r  + A^r)r  and  p 


p/r 


(A;/r+A^r) 


i -nr.  Clearly,  this  likelihood  is  nonidentifiable. 


As  a consequence,  the  Fisher  information  matrix  I(Ai,A2,r)  is  singular.  The  MLE’s 
are 


A = n 


p = ni/n. 


(1.25) 


According  to  the  solution,  there  is  no  unique  MLE  for  r,  and  the  asymptotic  theory 
can  not  be  applied.  Suppose  that  r can  be  treated  as  a nuisance  parameter.  One  may 
approach  this  problem  from  either  the  profile  likelihood  or  the  integrated  likelihood 
angle.  Both  shall  also  provide  the  same  MLE’s  as  above.  However,  r may  carry  an 
important  message  for  the  underlying  study.  Then  how  does  one  provide  a solution? 


Example  3 Let  assume  Block-Basu’s  ACBVE  model  given  in  (1.10).  The  likelihood 
function  for  {(tj,ij)  ■ 1 < j < n}  is  given  by 

L{ A,  p,  4 J)  a A"  p"1  (1  - p)"2  exp{— A ij}l[0<^<A],  (1-26) 

3 = 1 

where  A = Ai  + A2  + Ai2,  0 = Ai  + A2,  and  p = For  a fixed  0,  the  MLE’s  are 

A = min{0,n(^tj)_1},  (1.27) 

3 = 1 


p = ni/n. 


(1.28) 
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The  Fisher  information  matrix  can  not  be  obtained  since  log(l[O<0<>])  is  not  differ- 
entiable with  respect  to  (j)  or  A.  The  same  problem  which  we  encountered  with  in 
Example  2 arises  again. 

Example  4 Consider  the  covariate- adjusted  Sarkar  BVE  model.  Wada,  Sen,  and 
Simakura’s  (1996)  ignored  the  parameter  which  is  nonidentifiable  in  the  likelihood 
function,  and  approached  directly  from  the  integrated  likelihood  function.  MLE’s  are 
obtained  from  the  logarithm  of  the  integrated  likelihood  function 

£{  z)/3  - e^Ptj  + l[iJ=i]z'-7  - log(l  + ezj'7)}.  (1.29) 

j= i 

Since  no  closed  form  solution  for  the  MLE’s  of  the  parameters  7 and  f3  is  available, 
one  employs  the  iterative  method,  Newton-Raphson  approach.  Wada  and  Sen  (1993) 
verified  the  consistency  of  the  MLE’s,  i.e.,  n1/2(/3  - /3,7  - 7)  has  asymptotic  mul- 
tivariate normal  distribution  with  mean  vector  zero  and  variance-covariance  matrix 

mi)}-1. 

In  contrast,  a Bayesian  utilizing  both  sample  information  and  non-sample  informa- 
tion makes  inference  on  parameters  from  the  conditional  perspective.  The  non-sample 
information  is  given  by  the  parameter  distribution  functions,  namely,  prior  7r(.)  and 
loss  functions  Ls(.)  (we  shall  assume  square  error  loss  throughout  this  study).  To 
demonstrate  the  idea,  let  d*  and  6 denote  the  data  and  model  parameters,  respec- 
tively. The  Bayesian  inference  is  based  on  the  (proper)  posterior  distribution,  i.e., 
7r(0|d*)  oc  L(G\d*)  7r(0).  The  Bayes  estimator  0B  for  0 is  calculated  by  minimizing 
the  posterior  Bayes  risk  E[LS(619B |d*)],  equivalently,  0B  = En(G\d*).  The  corre- 
sponding posterior  variance  is  E(0|d*)  = E„{9B  - 0|d*)2.  The  nonidentifiability 
problem  can  be  circumvented  via  a suitable  prior  specification.  How  could  7 r(0)  be 
determined  for  competing  risks  models?  We  review  some  of  the  available  literature. 
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Liu  and  Nakao  (1991)  suggested  the  gamma-related  prior  for  the  Marshall-Olkin 
BVE,  i.e., 

tt(Ai,  A2,  A12)  oc  A?1  (A12  + A2)a2A“2(A12  + A0a2A^  e-***i-**>»-<*>™ , (1.30) 

The  derived  Bayes  estimators  for  (Ai,Ai,A12)  and  S(ti,  t2\\i,  Ax,  Ai2)  can  all  be  ex- 
pressed explicitly.  Ideally,  information  for  (d,  ax,  a2,  a3,  a4,  a5,  a6)  may  be  historically 
available.  This  does  not  always  happen  in  practice. 

Lu  (1992)  proposed  the  data-driven  prior  for  the  Marshall-Olkin  BVW  model 
(1.15)  with  equal  shape  parameter  r.  He  considered  a two-stage  prior  given  by 

7Tb  (r  = bj)  = pj 

7rAJ(A1,A2,A12|r  = 6J)  oc  (A12  + A^  (A12  + \2)a*  e-tu>*-tv*-(eis+toj)>nf 

where  {bj,Pj}*=1  and  {aij , &j} (i,j)£{ i,2}  are  estimated  from  the  data.  Explicit  poste- 
rior distributions  are  obtained.  Other  subjective  options,  such  as  the  gamma  prior 
(Berger  and  Sun,  1993)  and  the  joint  location-scale  conjugate  prior  (Bury,  1973)  have 
been  used.  These  do  not  provide  a closed  form  posterior  distribution.  However,  the 
computation  can  be  accomplished  by  numerical  methods,  such  as  Gibbs  sampler  and 
adaptive-rejection  sampler. 


1.4  Noninformative  Priors 


In  view  of  the  difficulty  of  finding  subjective  priors,  one  has  to  restrict  attention 
to  noninformative  priors.  The  simplest  and  the  most  historical  noninformative  prior 
is  attributed  to  Laplace  (1820).  He  vaguely  assigns  equal  probability  on  the  entire 
parameter  space,  i.e.,  7 rj/(0)  oc  1.  However,  a constant  prior  for  one  parameteriza- 
tion will  not  typically  transform  into  a constant  for  another.  Thus,  Jeffreys  (1961) 
proposed  a prior  which  is  invariant  under  any  one-to-one  reparameterization.  It  is  de- 
rived such  that  it  is  proportional  to  the  positive  square  root  of  the  determinant  of  the 
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Fisher  information  matrix.  Let  {lj}  be  independently  distributed  random  variables 
with  identical  pdf  f{.\9).  Denote  1(0)  = ^ E log[/(yj|0)],  then 

i=  1 

7Tj(0)  OC  |I(0)|1/2,  (1.31) 

where  1(0)  = EQ[-d2l(0)  / d6ud6v\. 

Subsequent  derivation  for  noninformative  priors  are  mostly  for  pragmatic  pur- 
poses. For  instance,  interest  lies  in  particular  parameters,  or  sometimes  in  functions 
of  parameters.  Bernardo  (1979)  introduced  the  notion  of  the  reference  priors.  He 
recommended  Q — i?{log[7r(0|y)]  — log[7r(0)]}  as  a measure  of  information  about  6 
provided  by  the  data.  Here  E denotes  expectation  over  the  joint  distribution  of  y and 
0.  The  rationale  behind  is  that  the  larger  this  quantity  Q,  the  less  informative  the 
prior.  The  reference  prior  i tr(0)  is  the  prior  that  maximizes  Q.  To  construct  a refer- 
ence prior,  one  could  follow  the  four-step  scheme  proposed  by  Berger  and  Bernardo 
(1989). 

Step.l  Let  (0i,0|_1|)  be  a partition  of  the  parameter,  where  0 1 are  the  parame- 
ters of  interest,  and  0|_q  are  the  nuisance  parameters.  First,  the  conditional  prior, 
7T  (0|-1|I^),  is  defined  as 

p1(0|_1||01)-|I22(0)|1/2,  (1.32) 

where  the  Fisher  information  matrix  1(0)  is  partitioned  as 


I = 1(0) 


lll(0)  ll2(0) 
l2l(0)  I22(0) 


Step.  2 Choose  a sequence  of  subsets  of  the  parameter  space  Q for  0,  fij  C fi2  C • • •, 
such  that  Ujfh  = D,  and  p1(0|_1||01)  has  finite  mass  on  fL  q = {0|_i|  : (0i,0|_i|)  € 
DJ  for  all  0i.  Normalize  p1(0|_1||0i)  on  each  fh  q to  obtain 


P2l)(0|-i||0i)  - ^(0i)Pi(0|-i||0i)l[0|  i|6n  ^ 


(1.33) 
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where  fc<(0i)  = {/n  Pi(0|_i||0i)  d0|-i|}  l. 

i,ts  i 

Step.  3 Find  the  marginal  reference  prior  (assuming  the  existence)  for  6\  with  respect 
to  |0i).  That  is 


Pz\Oi)  = exp 


{~2ket 


P{2]{d |_i||0i)  log 


I(^) 

I22(g) 


dG 


l-i| 


(1.34) 


Step. 4 Define  the  reference  prior  for  (0X,  0j_i|)  by 


(1.35) 


assuming  the  limit  exists  and  G\  is  a fixed  point. 

Remark:  When  there  is  no  nuisance  parameter,  i.e.,  6 |_i|  is  an  empty  vector,  (1.35) 
equals  to  Jeffreys’  prior. 

A somewhat  different  criterion  for  developing  noninformative  priors  is  based  on 
matching  the  posterior  coverage  probability  of  a Bayesian  credible  set  with  the  corre- 
sponding frequentist  coverage  probability  with  a reminder.  The  matching,  discussed 
further,  is  accomplished  through  either  posterior  quantiles  or  the  highest  posterior 
density  (HPD)  ellipsoid.  Matching  based  on  one-dimensional  posterior  quantiles  with 
order  o(n~1/2)  was  first  investigated  by  Welch  and  Peers  (1963).  Let  0[a)(7r,d*)  de- 
note the  ath  posterior  quantile  for  6\.  If  a prior  n(B)  satisfies  P{6\  < #iQ)(7r,  d*)|0}  = 
aH -o(n_1//2),  it  is  called  the  first  order  probability  matching  prior,  and  is  characterized 
by 

t ^[(/urI/2/“ir(e)]  = 0,  (1.36) 

where  I-1  = ((/IJ)).  For  k = 1,  the  unique  solution  7 r(0)  for  (1.36)  is  Jeffreys’  prior. 
Tibshirani  (1989)  characterized  the  class  of  the  first  order  probability  matching  priors 
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when  P1  = 0 V j = 2,  • • • , k.  For  g(92,  • • • , 9k)  > 0,  the  solution  for  (1.36)  reduces  to 

n(0)<xP1(0)g(92,---,6k).  (1.37) 

For  better  approximation,  Dey  and  Mukerjee  (1993)  and  Mukerjee  and  Ghosh 
(1997)  developed  the  second  order  probability  matching  prior  with  remainder  of  order 
o(n~l).  This  development  helps  narrow  down  the  selection  of  priors,  and  quite  often 
leads  to  a unique  prior  within  the  first  order  probability  matching  priors.  In  addition 
to  (1.36),  it  is  obtained  by  solving 


t £ tttt  T{LJ.„^(3/“*-2r”)x(0)}  = 0,  (1.38) 


d2 


k k k k 


jrirtl  dOjde, 


3 v=l  j-ir=lu=l  d6v 


where  Pr  = P1Irl/P1,  and  Ljru  = EQ{d3l(9) /d6jd6rd9u} . We  simplified  (1.38)  for 
a diagonal  1(0)  to 


lg^’ 


,0k) 


d_ 

del 


[In*'2 


Li. 


i,i 


k 

+ E 


—\rl/2 

u= 1 Mu  U 


Inn  L 


llu 


9(0  2,---,0k)]=O.  (1.39) 


Further  results  in  this  area  were  pursued  by  Datta  (1996).  Datta  (1996)  derived  nec- 
essary and  sufficient  conditions  for  a prior  matching  the  marginal  coverage  probability 
(up  to  0(n~1))  for  each  parameter  of  interest  to  be  jointly  probability  matching.  He 
refers  to  such  priors  as  jointly-probability-matching  priors.  In  particular,  if  1(0)  is 
diagonal,  the  conditions  are  automatically  satisfied. 

An  alternative  approach  is  to  ensure  the  frequentist  validity  of  the  highest  pos- 
terior density  region  (HPD),  which  provides  a matching  prior  when  the  parameter  is 
multi-dimensional.  Ghosh  and  Mukerjee  (1993)  characterized  the  HPD  prior,  7r mh(0) 
with  matching  up  to  o(n_1),  as  a solution  of  the  differential  equation 


EE 


d_ 

dOj 


k k k k 

{Prvr(e)}  + Y.Y.Y.Y. 


u=l j= 1 r=l u=l 


d_ 

W, 


{IvrPuLj<runMH(0)}  = 0, 


(1.40) 
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where  7 rr(0)  = dir (0)/d9r  and  Ljjru  = EQ[{dl(9)/d9j}{d‘2l(0)/d9rddu}],  for  1 < 
j,r,u  < k.  The  simplification  on  (1.40)  for  a diagonal  1(0)  is  then  given  by 


k 


E 


d_ 

d6r 


{/"">(«)}  + E E 


r= 1 u=l 


d_ 

ddT 


{rruLUJU-rr(G)}  = 0. 


(1.41) 


1.5  Research  Proposal 


To  the  extent  of  Bayesian  method  and  model  generalization  in  the  competing  risks 
framework,  this  study  will  focus  on: 


• Bayesian  approach  is  often  suitable  in  overcoming  the  nonidentifiability  of  the  like- 
lihood. Noninformative  priors  are  often  considered  as  acceptable  substitutes  when  no 
informative  prior  is  available.  We  shall  also  consider  informative  priors  in  Chapter  2 
to  resolve  the  nonidentifiability  problem. 

• The  reparameterization  invariant  noninformative  priors,  such  as  Jeffreys’  prior  and 
the  reference  priors,  are  particularly  preferable.  Such  priors,  however,  are  derived 
based  on  a fully-specified  and  nonsingular  Fisher  information  matrix.  They  cannot 
be  directly  applied  to  the  competing  risks  data  due  to  the  singularity  of  the  Fisher  in- 
formation matrix  (see  Example  2)  or  its  inability  to  differentiate  on  the  log-likelihood 
with  respect  to  some  of  the  parameters  (see  Example  3).  To  overcome  this  problem, 
we  suggest  the  stagewise  prior  elicitation  in  Chapter  2.  This  technique  is  extended 
as  well  to  the  generalized  BVE  models  introduced  in  Chapter  3,  where  the  baseline 
model  is  adjusted  for  covariates.  These  stagewise  priors  will  be  further  examined  by 
the  probability  matching  criteria  for  comparison  purpose. 

• Posterior  distributions  are  the  cornerstone  of  Bayesian  inference.  It  is  important 
that  posteriors  are  genuine  probability  distributions.  In  both  these  chapters,  we  shall 
verify  the  propriety  of  the  posteriors,  and  implement  the  Bayesian  method  for  some 
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real  life  data  using  the  Markov  Chain  Monte  Carlo  method.  In  addition,  model 
adequacy  will  be  tested  through  Bayesian  posterior  goodness-of-fit  procedure. 

• A more  flexible  class  of  models  for  the  competing  risks  outcomes  is  suggested  in 
Chapter  4.  One  of  the  advantages  of  the  proposed  methodology  in  this  chapter  is  that 
it  always  provides  an  identifiable  likelihood.  Noninformative  priors  can,  nevertheless, 
be  used  fruitfully. 


CHAPTER  2 

BAYESIAN  ANALYSIS  OF  SELECTED  BIVARIATE  EXPONENTIAL  MODELS 

2.1  Introduction 


Bivariate  exponential  (BVE)  distributions  were  developed  primarily  for  modeling 
correlated  bivariate  lifetimes.  They  have  been  used  widely  also  for  the  analysis  of 
competing  risks  data  when  the  component  risks  are  dependent.  For  such  analysis, 
frequentist  approach  often  runs  into  difficulty  due  to  nonidentifiable  likelihood.  With 
an  end  to  overcome  the  nonindentifiability,  recent  literature  (cf.  Section  1.3)  has  been 
geared  towards  Bayesian  analysis  with  subjective  priors.  However,  systematic  prior 
elicitation  is  often  difficult.  This  chapter  focuses  instead  on  Bayesian  analysis  with 
noninformative  priors. 

In  our  competing  risks  framework,  we  consider  the  subfamily  of  BVE  models  for 
which  the  time  to  failure  T is  independent  of  the  associated  the  indicator  variable  / 
denoting  the  cause  of  failure.  We  will  provide  several  examples  of  this  in  Section  2.2. 
In  our  framework,  each  single  datum  consists  of  three  pieces  of  information:  the  time 
to  failure  T,  the  nonsingularity  indicator  'll  denoting  whether  or  not  two  components 
have  simultaneously  failed,  and  the  cause-specific  indicator  A.  Due  to  the  nature  of 
data  and  BVE  model  structure,  quite  often  we  are  led  a likelihood  function  with  a 
nonregular  Fisher  information  matrix  impeding  thereby  the  calculation  of  standard 
noninformative  priors,  such  as  Jeffreys’  prior  and  its  variants.  This  means  that  it 
is  necessary  to  modify  the  priors.  In  Section  2.3,  a stagewise  noninformative  prior 
elicitation  strategy  is  proposed,  and  is  utilized  for  a general  class  of  nonsingular 
BVE  models.  A variety  of  noninformative  priors  are  developed,  and  are  used  for 
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data  analysis.  In  Sections  2.4  and  2.5,  we  demonstrate  Bayesian  applications  for  the 
Marshall-Olkin  BVE  and  selected  ACBVE’s  in  detail.  In  addition,  we  compare  the 
performance  of  newly  developed  noninformative  priors  in  Section  2.6,  particularly 
those  based  on  the  frequentist  probability  matching  criterion.  In  Section  2.7,  we 
provide  a simulation  study  to  demonstrate  the  versatility  of  Bayesian  procedures. 

2.2  Notation 


The  pair  (T,  I ) actually  reflects  three  facts:  time  to  failure,  cause  of  failure  and 
singularity  (or  nonsingularity).  For  this  reason,  we  define  a triplet  that  is  equivalent 
to  (T,  I).  Let  4>  = 1 if  / = 1 exclusively  or  I = 2 exclusively,  4>  = 0 if  I = 1 and  1 = 2 
simultaneously,  A = 1 for  / = 1 given  41  = 1,  and  A = 0 for  I = 2 given  4'  = 1.  Both 
A and  4/  have  interesting  interpretations  in  the  current  context:  A is  interpreted  as 
the  cause-specific  indicator,  and  4'  is  the  nonsingularity  indicator.  One  should  keep 
in  mind  that  P(4'  = 1)  = 1 under  the  assumption  of  nonsingularity.  Therefore,  in 
this  case,  we  shall  not  redundantly  retain  4b  Following  the  well-defined  relationship 
between  / and  (A,  4f),  the  triplet  presentation  is  given  by  (T,  A,  4').  Here  are  some 
notions  we  shall  use  mostly  in  this  study. 

• n:  sample  size. 

• (ti,6i,ipi):  realization  of  (7),  A*,  4q),  for  1 < i < n. 

• dj  = (£;,  5i,  ith  observed  triplet. 

• d = {dj  : 1 < i < n}:  the  entire  dataset. 

n 

• tt\  realization  of  Tt  = X)  7). 

t=i 

n 

• rii : realization  of  Ni  = X!  Aj4q. 

i= 1 
n 

• n2:  realization  of  N2  = XI  (1  — Ai)4b- 

i=  1 

n 

• ni2:  realization  of  IV12  = (1  ~ 4/j). 

i= 1 
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• 9:  model  based  parameter  vector. 

• 9i\  the  ith  element  of  6. 

• l[Ep  indicator  function  of  event  E. 

• f(x\9):  probability  density  function  (pdf)  of  x given  parameters  0. 

• F(x\0):  J^fm  dy. 

• L(9\d):  likelihood  function  of  6 given  data  d. 

• P(.):  probability  measure. 

• 7 r(9):  prior  density  of  6. 

• 7r(0|d):  posterior  density  of  9 given  data  d and  prior  tt. 

• 0[Aj]:  a block  diagonal  matrix  with  ith  diagonal  element  A*. 


2.3  Noninformative  Prior  Modification 


The  Fisher  information  matrix  has  so  far  been  the  cornerstone  in  the  development 
of  a wide  class  of  noninformative  priors.  However,  for  most  competing  risks  models, 
the  likelihood  function  of  (T,  A,  T)  generated  from  ACBVE’s  becomes  nonidentifiable, 
thereby  making  the  direct  calculation  of  the  Fisher  information  matrix  invalid.  We 
begin  with  a class  of  likelihood  functions.  Assume  that  {D*  : 1 < i < n}  are 
a sequence  of  independent  random  vectors  from  a common  distribution  with  pdf 
f(.\9).  Let  d*  = {d*  : 1 < i < n)  be  the  corresponding  realization  of  {D*  : 1 < i < 
n}.  Write  9 — (0! , as  a partition  of  9.  Suppose  that  the  likelihood  function 

L(0|d*)  = n f{d*\9)  is  given  by  the  expression 

i=i 


mi')  = i,(9i|d-)i„(e|_,||d-)l[e|_ii6R(9i)].  (2,1) 

where  L\(.)  and  L0(.)  are  some  smooth  differentiable  functions,  R(9 1)  defines  the 
range  for  9 [_i|,  which  may  depend  on  9\.  In  case  Lo(0|_i||d*)  is  independent  of  0|_i|, 
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and  1 ^ 1|efl(0j)]  is  present,  the  logarithm  of  (2.1)  is  nondifferentiable  with  respect  to 

any  element  of  #|-i|.  Hence  the  Fisher  information  matrix  does  not  exist.  As  a conse- 
quence, noninformative  priors  such  as  Jeffreys’  prior,  reference  priors,  or  probability 
matching  priors  become  impossible  to  calculate.  This  phenomenon  as  mentioned  ear- 
lier is  present  for  competing  risks  models  originating  out  of  ACBVE’s,  e.g.,  Freund’s 
ACBVE  and  Block-Basu’s  ACBVE.  To  overcome  this  technical  difficulty,  we  propose  a 
two-stage  noninformative  prior  elicitation  scheme.  This  scheme  stems  from  the  idea  of 
the  prior  decomposition  (Berger  and  Bernardo,  1989),  i.e.,  7 r(0)  = 7r0(^|— 1|  |0i)7r1(0i). 
The  two-stage  eliciting  procedure  is  stated  follows.  Given  9i,  a “proper”  noninfor- 
mative prior  7To(0|_i||0i)  is  assumed  at  the  first  stage.  The  marginal  noninformative 
prior,  7Ti(0i)  say,  is  then  derived  from  the  marginal  likelihood  function 

f L(0|d>o(0|_i||0i)  dO ,_i|  oc  |d*).  (2.2) 

JR{  Vi) 

The  above  holds  trivially  for  any  proper  7To(0|_i||0i).  Furthermore,  the  propriety  of 
7ro(0|_i||0i)  is  essential  to  guarantee  both  the  existence  of  Li(6i\d*)  and  the  validity 
of  the  second  stage  procedure.  In  the  second  stage  specification,  using  Li(0i|d*)  as 
the  base,  we  consider  Laplace’s  prior,  Jeffreys’  prior,  and  the  reference  prior  as  default 
priors. 

A general  expression  for  the  likelihood  function  based  on  a variety  of  BVE’s  is  in 
the  form  of  (2.1)  and  is  given  by 

L(A,p,0|d)  oc  Ane~Mtpni(l  - p)n2To(</>|d)l[06jR(M],  (2.3) 

where  A is  the  hazard  rate  for  T and  p is  the  probability  of  failing  from  cause  1.  For  the 
Marshall-Olkin  BVE,  L0((p |d)  a cf)ni+n2(  1 — </>)ni2,  and  i?(A,  p)  is  free  from  (A,  p).  For 
ACBVE’s,  L0(<p |d)  = 1 and  is  present.  Noninformative  prior  calculation 

as  introduced  in  Section  1.4  is  possible  for  the  Marshall-Olkin  BVE.  For  ACBVE 
models,  however,  priors  are  found  by  following  the  two-stage  scheme  as  described 
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above.  We  shall  illustrate  these  two-stage  priors  for  ACBVE  models.  First,  however, 
we  begin  with  the  Marshall-Olkin  BVE. 

2.4  Bayesian  Analysis  for  Marshall-Olkin  BYE 


Suppose  that  lifetimes  of  sampled  objects  are  subject  to  two  competing  risks, 
where  the  chance  of  simultaneous  failure  is  nonzero,  i.e.,  P(Ni2  > 0)  > 0.  Suppose 
that  the  underlying  model  is  the  Marshall-Olkin  BVE.  Then  the  likelihood  function 
under  its  original  parameterization  is  given  in  (1.10).  We  consider  the  following  repa- 
rameterization based  on  two  primary  objectives:  the  likelihood  function  has  the  ex- 
pression compatible  to  (2.1),  and  the  new  parameters  address  certain  model  features. 
Here  we  choose 


A — Ai  + A2  + A12,  p — 


Ai 

Ai  + A2 


0 


Ai  + A2 

Ai  + A2  + A 
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These  new  parameters  A,  0,  and  p are  interpreted  as  the  instantaneous  failure  rate 
of  T,  the  probability  of  failing  from  a single  cause,  and  the  conditional  probability  of 
failing  from  cause  1 given  a single-cause  failure,  respectively.  Given  ni,  n2,  ni2,  and 
tt,  the  joint  likelihood  function  of  (A , p,  0)  is  given  by 


T(A,p,0|d)  oc  Ane"Aitpni(l  - p)n20ni+”2(l  - 0)ni2,  (2.4) 

which  corresponds  to  the  likelihood  function  (2.1)  by  setting  6\  = (A,  p),  0|_i|  = 0, 
Li(0i |d)  = Lx(A,p|d)  oc  Ane-At‘pni(l  - p)"2,  and  Lo(0|-i||d)  = T0(^|d)  oc  0”1+"2(  1 - 
0)”12.  On  the  other  hand,  by  the  independence  of  T and  (A,  \k),  one  can  write 

L(A,p,0|d)=Lc(A|d)L<i(p,0|d),  (2.5) 

where  Lc(A|d)  oc  A”e_Atf,  and  Ld(p,  0|d)  oc  pni(l  - p)n20"1+”2(l  - 0)ni2.  This  demon- 
strates the  likelihood  independence  and  consequent  parameter  orthogonality.  We 
derive  now  certain  noninformative  priors  based  on  (2.5). 
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2.4.1  Jeffreys’  Prior  and  Probability  Matching  Property 


Initially,  the  full  Fisher  information  matrix  is  calculated  as 


I(A  = 


Ic(^)  0ix2 

02x1  Irf(P,  <P) 


where  0Xi!/  denotes  the  zero  matrix  of  dimension  x x y,  IC(A)  — X 2,  and 


I d{p,(t>)  = 


<t>P  *(1  - p)  1 0 

0 (p-\\-<p)-x 


Thus,  Jeffreys’  prior  is  given  by 


717  = T/( a,  P,  (p)  oc  njtC(X)TTj4(p,  (p), 


where 


flj.c(A)  oc  A 1, 

vjApA)  oc  - 0)-5. 


(2.6) 


(2.7) 

(2.8) 


Following  the  arguments  due  to  the  orthogonality  between  A and  (p,  <p),  Jeffreys’  prior 
given  in  (2.6)  is  the  unique  second  order  probability  matching  prior  for  A.  In  Section 
2.7,  we  will  have  more  extensive  discussions  on  this  matching  property. 


2.4.2  Reference  Prior 


The  reference  prior  due  to  Bernardo  (1979)  and  Berger  and  Bernardo  (1989,  1992) 
is  appropriate  when  a subset  of  (A,  p,  (p)  is  the  parameters  of  interest  while  the  comple- 
mentary subset  consists  of  the  nuisance  parameters.  The  four-step  formulae  by  Berger 
and  Bernardo  (1989)  is  used  to  compute  the  reference  priors.  Let  12  = {(A,p,  (p)  : 
A > 0,  p G (0,1),  € (0,1)}.  Select  12*  = {(*_1,1  — z_1)  x (i-1,l  — i~l)  x (i-1,*)} 
as  the  sequence  of  compact  subspaces  of  12.  Let  0/  and  0^  be  generic  symbols  for 
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the  parameters  of  interest  and  the  nuisance  parameters,  respectively.  The  resulting 
reference  priors,  denoted  by  7 xRef,  with  respect  to  all  possible  nontrivial  parameter 
vector  partitions  are  presented  in  Table  2.1. 


Table  2.1.  Reference  Priors  for  the  Marshall-Olkin  BYE 


Oi 

A 

(P>  <t>) 

P 

(0,  A) 

4> 

(Yp) 

On 

(p,  <t>) 

A 

(<A,A) 

P 

(A,p) 

<t> 

KRef{0) 

7IJ 

TO 

7T  J<\>~* 

7Tjd_2 

Reference  priors  do  not  often  agree  with  Jeffreys’  prior.  This  can  be  found  in 
columns  4-7  of  Table  2.1.  The  coincidence  occurs  when  the  parameter  vector  is 
partitioned  according  to  the  variable  type  (discrete  or  continuous)  that  the  parameter 
is  associated  with.  Columns  2-3  in  Table  2.1  exemplify  this  result. 

2.4.3  Propriety  of  Posteriors  and  Bayesian  Inference 

It  is  necessary  to  check  the  propriety  of  posteriors  under  the  above  noninformative 
priors.  Let  7Tr  oc  Define  the  beta  (proper  or  improper)  pdf 


n(x-,a,b)ocxa  X(1  — x)b  Y 

(2.9) 

Then  the  noninformative  priors  7 tj,  t;r,  and  7 py  = 7ru(X,p,<t>)  oc  1 

are  all  expressible 

in  the  form 

7r(A;  ax,  1)7 r(p;  ap,  bp)n((j);  a 0, 6^). 

(2.10) 

Specifically, 

tti/(A,P,</>)  a 7r(A;  1,  l)7r(p;  1,  l)7r(0;  1, 1), 

(2.11) 

TTj(X,p,(j))  oc  7r(A;O,l)7r(p;.5,.5)7r(0;l,.5), 

(2.12) 

7Tft(A,p,0)  oc  7t(A;0,  1)7t(p;  .5,  .5)7r(0;  .5,  .5). 

(2.13) 

32 


The  following  theorems  are  provided  to  characterize  a subclass  of  priors  from 
(2.10),  for  which  the  associated  posteriors  are  proper,  and  then  find  out  whether  the 
three  candidate  priors  belong  to  this  subclass. 

Lemma  2.1  Suppose  that  the  likelihood  function  and  prior  density  are  given  in  (2.4) 
and  (2.10),  respectively.  Then  the  posterior  distribution  under  this  prior  is  proper  if 
and  only  if  a\  + n > 0,  ap  + ni  > 0,  bp  + n2  > 0,  a#  + n\  + n2  > 0,  and  b $ + n 12  > 0. 
Proof  From  (2.4)  and  (2.10)  the  joint  posterior  is  given  by 

7r(A,p,  0|d)  a A"+aA-1e“Attp"1+0p-1(l  — p)n2+bp~l(j)n i+^+a*-1^  _ (2.14) 

The  propriety  follows  the  properties  of  beta  and  gamma  integrals. 

Theorem  2.1  Suppose  that  {(tj,^,^)  : 1 < i < n}  are  independent  pairs  from 
the  Marshall-Olkin  BVE  model.  Then  the  posterior  distribution  under  each  of  non- 
informative  priors,  -Ku,  kj,  and  7r r,  is  proper  if  and  only  if  n\  > 1,  ri2  > 1,  and 
nl2  > 1. 

The  proof  of  this  Theorem  follows  from  Lemma  2.1  and  (2. 1 1)-(2. 13) . 

From  (2.14),  one  can  recognize  that  A,  p and  f>  have  independent  posterior  distri- 
butions of  standard  forms.  Table  2.2  summarizes  these  posterior  distributions  under 
the  selected  noninformative  priors. 


Table  2.2.  Posterior  Distributions  for  the  Marshall-Olkin  BVE 


e 

A 

P 

<t> 

70/(0  |d) 
TTj  (0|d) 
7Tfl(0|d) 

G(n  + 1,  t() 
G(n,tt) 
G(n,tt) 

B(ni  + 1,  ri2  + 1) 
B(ni  + i,n2  + 4) 
B{n  i + | ,n2  + |) 

B(ji\  + n2  + 1, 7ii2  + 1) 
B(n  i + n2  + 1,tjj2  + 4) 
B(ni  + n2  + ni2  + 0 

2.5  Bavesian  Analysis  for  the  ACBVE  Models 


As  pointed  out  earlier,  the  Marshall-Olkin  BVE  is  singular  since  the  chance  of 
simultaneous  failure  is  positive.  This  distribution  has  been  modified  in  many  ways 
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to  yield  nonsingular  BVE’s.  Examples  of  these  are  Freund’s  ACBVE,  Block-Basu’s 
(B-B)  ACBVE,  Sarkar’s  ACBVE,  and  Gumbel-B’s  ACBVE  (cf.  Chapter  1).  For  all 
these  models,  after  suitable  reparameterization,  the  new  likelihood  can  be  presented 
in  the  form  (2.1).  Let  (A , p,  0)  be  a reparameterization  for  the  parameter  vector  in 
the  original  model  such  that  A (A  > 0)  and  p (0  < p < 1)  represent  the  hazard 
rate  for  T and  the  probability  of  observing  A = 1,  respectively.  The  new  parameter 
vector  0,  denoting  the  complement  of  (A,p)  under  the  reparameterization,  falls  in 
the  parameter  space  R(X,p).  The  relationship  between  G and  (A,  p,  0),  and  0 and 
R( A,  p)  can  be  viewed  from  Table  2.3  given  below. 


Table  2.3.  The  ACBVE  Reparameterization 


Model 

B-B  & Sarkar 

Gumbel-B 

Freund 

A 

Ai  + A2  + A12 

1 

S-  CN 

+ 

T 

t-  i-H 

ct  + (3 

Ai 

V 

a 

P 

A1+A2 

At  *+A$  1 

a+P 

4> 

Ai  + A2 

r 

(a  — a,  p'  — p) 

R(x,  P) 

(0,  A) 

(0,1) 

(0,  A - A p)  x (0,  Ap) 

The  likelihood  function  of  a ACBVE  is  given  by  the  general  form 

L(A,p,0|d)  oc  A"e-At‘pni(l  - p)”2l[0efi(A)P)].  (2.15) 

Relating  (2.15)  to  (2.1),  one  sees  that  Gx  = (A,  p),  0|_i|  = 0,  Li(&i |d)  = Zq(A,p|d)  oc 
A"e-^pni(l  - P)n2  and  Lo(0|-i,|0i,d)  = Lo(0|_n|d)l[0|_i|6fl(0i)]  oc  1 [0e/J(A,p)]- 

This  inevitable  nonregularity,  attributed  to  the  nature  of  competing  risks  data  and 
ACBVE  model  assumption,  blocks  one  from  finding  the  noninformative  priors  arising 
out  of  the  Fisher  information  matrix.  As  a remedy,  the  two-stage  prior  elicitation 
strategy  proposed  in  Section  2.2  is  used  to  find  noninformative  priors.  At  the  first 
stage,  one  assumes  /kUi  = /Kul  (0|A,  p)  oc  1 for  0 e R(X,p).  Then  the  integrated 
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likelihood  is  given  by 

J L( A,  p,  <f> |d)7rVl  d(j)  = Li  (A,  p|d)  oc  A ne~Mtpni  (1  - p)"2 . (2.16) 

With  (2.16),  the  second-stage  marginal  priors  are  derived  accordingly. 

2.5.1  Marginal  Jeffreys’  Prior  and  Marginal  Reference  Prior 


We  first  recognize  the  decomposition  of  Li(A,p|d)  as 

Li(A,p|d)  = L1>c(A|d)LM(p|d), 

where 


(2.17) 


L1>c(A|d)  oc  Ane-A\  (2.18) 

LM(p|d)  oc  pni(l-p)n2.  (2.19) 

The  Fisher  information  matrix,  Ii  = Ix  (A,  p) , derived  from  (2.16)  has  the  form 


Ii,c(A)  0 

0 UM 


where  Ii,c(A)  = A 2 and  Ii,d(p)  = p X(1  — p)  1 are  derived  from  (2.18)  and  (2.19), 
respectively.  Thus,  the  second-stage  Jeffreys’  prior  is  given  by 

T/2(A,p)  = 7Tj2,c(A)7rj2id(p),  (2.20) 

where 

7Tj2iC(A)  oc  A-1,  (2.21) 

T/2,d(p)  oc  p-5(i_p)-§.  (2.22) 

Since  Ii(A,p)  is  diagonal,  the  second-stage  reference  prior  is  also  the  same  as 
Jeffreys’  prior  irrespective  to  the  choice  of  the  parameter  of  interest.  Thus,  we  propose 
the  two-stage  prior 


nju  = T/n(A,  p,  (f>)  oc  A lp  *(l-p)  2 l[«pe/s(A,p)] - 


(2.23) 
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Due  to  the  factoring  of  the  likelihood  in  (2.17)  and  the  independence  of  A and  p,  prior 
' kju  is  thus  probability  matching  for  A. 

A second  two-stage  prior  using  Laplace’s  prior  for  both  A and  p at  the  second 
stage  is  given  by 

*uu  oc  1 [0GjR(a,p)].  (2.24) 

2.5.2  Propriety  of  Posteriors  and  Bayesian  Inference 

Both  priors  nju  and  nVu  found  in  the  previous  section  are  improper.  We  must 
examine  the  propriety  of  the  resulting  posteriors.  Observe  the  fact  that  i?(A,p)  is 
compact  regardless  of  which  ACBVE  model  is  chosen.  Note  that  both  nju  and  nuu 
belong  to  the  general  class  of  priors 

^(A)  a\i  bX )7I’(P)  api  bp) (2.25) 

where  7r(A;  ax,  bx)  and  7r(p;  ap,  bp)  are  defined  in  the  previous  section.  The  resulting 
joint  posterior  is  given  by 

tt(A,  p,  </>|d)  oc  <\"+a*-1e-AttpB1+0'-1(l  - p)"2^-1  1 (2.26) 

For  nju-,  these  parameters  are 

(aA,  by,  ap,  bp)  = (0, 1;  .5,  .5).  (2.27) 

For  7Tuu,  these  parameters  are 

(ax,bx;ap,bp)  = (1,1;  1,1).  (2.28) 

The  propriety  of  the  posterior  given  in  (2.26)  is  a consequence  of  the  following  theo- 
rems. 

Lemma  2.2  Suppose  that  ax  + n > 0,  ap  + rq  >0,  and  bp  + n2  > 0.  Then  the 
posterior  given  in  (2.26)  is  proper. 
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Proof  Integrating  out  (2.26)  with  respect  to  0,  one  obtains  the  marginal  posterior 
given  by 

tt(A,  p|d)  a AB+a»-1e-AV1+a'-1(l  - pT2+bp _1-  (2.29) 

Under  the  assumptions,  the  propriety  follows  from  finiteness  of  beta  and  gamma 
integrals. 

The  following  theorem  is  now  a consequence  of  the  above  lemma. 

Theorem  2.2  Let  {(U,5j)  : 1 < i < n}  have  the  joint  likelihood  function  given  in 
(2.15).  -kju  and  nuu  are  the  two  stagewise  priors.  The  resulting  posterior  densities 
are  proper  if  and  only  if  n\  > 1 and  n2  > 1. 

In  Table  2.4,  we  summarize  the  posterior  distributions  of  A and  p.  Posterior 
distributions  for  0 can  not  always  be  explicitly  obtained.  These  are  mixtures  of 
standard  distributions  if  available.  Nevertheless,  all  posterior  moments  are  finite  and 
can  be  obtained  in  closed  forms.  These  results  are  presented  in  Table  2.5. 

Table  2.4.  Posterior  Distributions  for  A and  p of  ACBVE’s 


9 

A 

P 

7ruu(9\d) 
*ju{0  |d) 

G(n  + 1,  tt) 
G(n,tt) 

B{ri\  + 1,  n2  + 1) 
B(n\  + 5,  n2  + ^) 

Table  2.5.  Posterior  Distributions  (Moments)  for  0 of  ACBVE’s 


ACBVE 

B-B  & Sarkar 

Gumbel-B 

Freund 

<t> 

Ai  + A2 

r 

d\  = a — a 

d2  = 0 -13 

7r(0|d) 

KUU 

GW(i() 

U0,i 

- 

- 

Kju 

U0,i 

- 

- 

£(0  |d,7r) 

nuu 

( n + l)/2  tt 

1/2 

q(u)(n,n2,tt) 

~~WT( TT 

Kju 

n/2tt 

1/2 

q'f>(n,n2,tt) 

qtf)(n,ni,tt) 

U(0|d,7T) 

KUU 

(n  + l)(n  + 5)/12  tt 

1/12 

qu)(n,n2,tt ) 

qu)(n,nutt ) 

Kju 

n(n  + 4)/12fj 

1/12 

qjV)(n,n2,tt ) 

qj/)(n,ni,tt) 
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where 

U0A  Uniform  (0, 1) 


G^m\x) 

m 

~ m~l  t,  G(i,x), 

i=  1 

q{u\m,k,x) 

m A:+l 
— m+ 2 2x  ’ 

qjE){rn,k,x ) 

_ m— 1 

m+1  2x  ’ 

q[u\m,k,x ) 

_ m+1  fc+1  / 
m+2  2x2  \ 

(771-1-2)  (fc-(-2) 
„ 3(m+3) 

(m+l)(A:+l)  \ 
4(m+2)  ) 

Qj]{m,k,x) 

_ m k+h 

( (m+l)(fc+f) 

m(fc+i)\ 

m+1  2x2 

l 3(m+2) 

4(m+l)  ) 

2.6  Prior  Performance 


As  a means  to  evaluate  the  performance  of  the  proposed  noninformative  priors, 
we  propose  comparing  the  coverage  probabilities  of  Bayesian  credible  intervals  with 
the  corresponding  frequentist  confidence  intervals  for  parameters  A,  p,  and  0 (of 
the  Marshall-Olkin  BVE).  Basically,  we  shall  be  looking  at  the  closeness  between 
the  Bayesian  posterior  tail  probability  and  the  corresponding  frequentist  coverage 
probability  for  the  parameter  of  interest. 

Write  6 = (#i,0|_1|)  as  before,  where  9X  is  a scale  parameter.  Without  loss 
of  generality,  let  9 x be  the  parameter  of  interest.  Define  the  posterior  distribution 
function  of  9\  under  prior  n as 

F*(v)  = < i/K>  *■)•  (2-30) 

Also,  let  $[QJ(d*)  denote  the  corresponding  the  posterior  a-quantile  of  9X  under  prior 
7r  and  data  d*  of  size  n,  namely, 


F,(«i“»(d;))  = a. 


(2.31) 
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We  say  that  7r  is  a first  order  probability  matching  prior  if 

P(9i  <#g(d;)|e)  = a + o(7l-i),  (2.32) 

while  7r  is  a second  order  probability  matching  prior  if 

m<<,(d;)|9)=a  + o(n-1).  (2.33) 

First,  we  consider  the  parameter  A from  the  BVE  model.  Recall  that  Tt\X  ~ 
G(n,  A).  From  Tables  2.2  and  2.4,  for  a typical  prior  7r(A;  a\,  b\),  the  posterior  is 
G(n  + a\,tt).  In  particular,  (a^,^)’ s associated  with  7 p/,  ttj,  nR,  ttjju,  and  nju  equal 
to  (1, 1),  (0, 1),  (0, 1),  (1, 1),  and  (0, 1),  respectively.  Let  A ,j“)(£t)  denote  the  posterior 
a-quantile  of  A given  tt  and  prior  tt.  Thus, 

a - P(X  < Xia\tt)\tt) 

= P{ XU  < &'),  (2.34) 

where  ^ denotes  the  cx-quantile  of  the  distribution  G(n  + a\ , 1).  The  corresponding 
frequentist  coverage  probability  is  given  by 

J P( A < At“>(T,)|A)  dFTi  = P(G*  < fia>),  (2.35) 

where  FTt  denotes  the  probability  distribution  of  Tt  and  G*  ~ G(n,  1).  The  above 
equality  suggests  that  the  frequentist  coverage  probability  can  be  affected  by  both  a 
and  a\  besides  the  sample  size  n.  The  exception  occurs  only  when  a\  — 0.  Both  (2.34) 
and  (2.35)  lead  to  the  same  value  a.  Then  the  probability  matching  for  A is  perfect 
irrespective  to  the  choice  of  n and  a.  For  the  BVE  examples,  one  finds  that  the  perfect 
matching  for  A is  achieved  when  nj  or  nR  is  selected  for  the  Marshall-Olkin  BVE, 
or  7Tjj/  is  selected  for  the  ACBVE’s.  Utilizing  (2.35),  the  frequentist  probabilities 
at  selected  posterior  quantiles  are  computed  in  SPLUS.  Theoretical  and  simulation 
results  of  this  computation,  presented  in  Tables  2.6  and  2.7,  respectively,  can  then 
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be  used  as  a reference  for  evaluating  the  probability  matching  performance.  Column 
3-11  provide  the  frequentist  distributions  evaluated  at  the  posterior  a-quantiles  (a  = 
.05,  .2,  .3,  .4,  .5,  .6,  .7,  .8,  .95)  given  n (n  = ^u,^uu,^j,^ju)  and  samples  of  size  n 
(■ n = 10,20,50,100). 


Table  2.6.  Exact  Frequentist  Coverage  Probabilities  for  A of  BVE’s  under  nu,  t^uu, 
nj,  and  7 tju 


n 

7 r 

a = .05 

a = . 2 

Q = .3 

a = .4 

a = .5 

a = .6 

Q = .7 

a = .8 

a = .95 

5 

7T  u(nuu) 
njinju) 

.1244 

.0500 

.3523 

.2000 

.4711 

.3000 

.5753 

.4000 

.6684 

.5000 

.7521 

.6000 

.8275 

.7000 

.8949 

.8000 

.9791 

.9500 

20 

Ku(nuu) 

kj(kju) 

.0795 

.0500 

.2701 

.2000 

.3829 

.3000 

.4883 

.4000 

.5879 

.5000 

.6820 

.6000 

.7710 

.7000 

.8548 

.8000 

.9682 

.9500 

50 

nu(nuu) 

njiirju) 

.0670 

.0500 

.2427 

.2000 

.3514 

.3000 

.4556 

.4000 

.5561 

.5000 

.6530 

.6000 

.7466 

.7000 

.8364 

.8000 

.9625 

.9500 

100 

nu{nuu) 

nj{nju) 

.0615 

.0500 

.2295 

.2000 

.3359 

.3000 

.4392 

.4000 

.5398 

.5000 

.6379 

.6000 

.7335 

.7000 

.8264 

.8000 

.9593 

.9500 

Table  2.7.  Simulated  Frequentist  Coverage  Probabilities  for  A of  BVE’s  under  nu, 
Kuih  ttj?  and  nju 


n 

7 r 

P 

II 

o 

Or 

a = .2 

a = .3 

a = .4 

a — .5 

a = .6 

a = .7 

a = .8 

a = .95 

5 

nu(nuu) 

Kj{nju) 

.1313 

.0567 

.3607 

.2017 

.4733 

.2983 

.5780 

.4043 

.6530 

.4983 

.7500 

.6103 

.8307 

.7160 

.8937 

.8047 

.9820 

.9497 

20 

nu{nuu) 

nj{nju) 

.0810 

.0507 

.2580 

.1897 

.3883 

.3047 

.4747 

.3870 

.5787 

.4970 

.6967 

.6087 

.7667 

.6893 

.8520 

.7933 

.9710 

.9513 

50 

nu{nuu) 

nj(nju) 

.0663 

.0460 

.2337 

.1960 

.3430 

.2940 

.4463 

.3847 

.5510 

.4953 

.6623 

.6000 

.7420 

.6930 

.8413 

.8033 

.9580 

.9500 

100 

nu(nuu) 

nj(nju) 

.0627 

.0497 

.2320 

.2027 

.3363 

.3000 

.4480 

.4037 

.5507 

.5063 

.6253 

.5930 

.7333 

.7013 

.8223 

.7940 

.9610 

.9517 

As  cited  earlier,  the  matching  criterion  of  A is  unaffected  by  the  prior  selection  for 
p (or  0)  due  to  the  factored  posterior.  Except  for  the  Laplace  prior,  all  other  selected 
noninformative  priors  perform  extremely  well  in  matching  the  posterior  probabilities 
of  A with  the  corresponding  frequentist  coverage  probabilities. 

When  the  original  random  variables  have  continuous  distributions,  Jeffreys’  prior, 
in  the  absence  of  nuisance  parameters,  is  known  to  be  a second  order  probability 
matching  prior.  There  is  no  general  theory  supporting  this  feature  of  Jeffreys’  prior 
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for  discrete  distributions.  However,  it  still  seems  worthwhile  to  compare  the  Bayesian 
distribution  for  those  parameters  of  interest  (e.g.,  p and  0)  at  various  points  with  the 
corresponding  frequentist  probabilities  for  moderate  sample  size,  n = n\  + n2  + 7ii2. 

In  the  following  discussion,  we  shall  establish  the  general  formula  for  computing 
the  frequentist  probabilities,  where  the  underlying  random  variables  are  discrete,  and 
marginal  posterior  densities  of  model  parameters  are  standard  or  posterior  quantiles 
are  accessible.  Later  on,  BVE  examples  are  considered  for  illustration. 

We  consider  the  general  setup,  where  V = (Vi , follows  some  discrete 

distribution  with  joint  pdf  given  by 

/v(v|0)  = P(V  = v|0),  (2.36) 

where  6 = (0l5  • • • , 0//)',  v = (vi,  ■ ■ • , Vi)',  and  l may  differ  from  l'. 

Let  7r(0|v)  be  the  proper  posterior  density  given  v and  prior  7r(0).  Suppose  that 

the  sequence  of  marginal  posterior  quantiles  {0,v^(v)  : 1 < i < l'j  are  obtainable  for 
0 < a < 1.  Then  at  the  first  stage,  for  any  fixed  0*  and  a,  the  frequentist  distribution 
evaluated  at  the  posterior  a-quantile  is  given  by 

£ /v(v|fl).  (2.37) 

In  particular,  let  V = (Ni,N2)  with  pdf 

/(m,n2|p,0)  = — -pni(l  - p)n20(ni+n2)(l  - (2-38) 

ni!n2!ni2! 

We  say  (Ni,N2)  ~ Mult(n;  0p,  0(1  — p)).  The  beta  marginal  posteriors  of  p and 
0,  corresponding  to  7p/,  7 vj,  or  ttr,  have  already  been  derived  in  Table  2.2.  Given 
a G (0, 1)  and  (rii,n2,ni2),  let  £^(«i,n2)  and  ^\n\:n2)  denote  the  Qf-quantile  for 
B(ni  + ap,  n2  + bp)  and  B(n\  +n2+  a#,  ni2  + b^),  respectively.  According  to  (2.38), 
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the  frequentist  coverage  probabilities  for  p and  0 are  given  by 


E 


n\ 


(a)/,..„  ob'K™-*  ~i)! 


p‘(l  - p)^(i+J)(l  - 4 ) , (2.39) 


E /(bilp>^) 


E 


n\ 


— </>i+J(l  - 0)n-<--»  . 


(2.40) 


For  ACBVE’s,  one  simply  identifies  the  binomial  distribution,  N\  ~ Bin(n;p). 
From  Table  2.5,  p|d  ~ B(ni+ap,  ri2+bp).  Applying  (2.37),  one  obtains  the  frequentist 
coverage  probability 


£ 

{i:p<£(pa\i,n-i)} 


*)■ 


p'(i-pY 


(2.41) 


These  numerical  results  will  be  presented  with  the  transformed  Q-Q  plots  by 
using  the  SPLUS  software  package.  We  convert  the  Q-Q  plot  axes  from  quantiles 
into  the  corresponding  probabilities.  Figures  2. 1-2.3  and  2. 4-2. 6 demonstrate  the 
exact  frequentist-versus-posterior  transformed  Q-Q  plots  for  p and  0,  respectively, 
based  on  (Ni,N2)  ~ Mult(n;  4>p,  0(1  — p))  with  the  combination  of  n = 10,  50,  100, 
and  (p,  4>)  = (-75,  .2),  (.5,  .4),  (.33,  .6),  (.125,  .8). 

It  follows  from  Figures  2. 1-2.6  that  the  frequentist  probabilities  of  p or  4>  given  in 
(2.39)-(2.41)  under  selected  noninformative  priors  do  depend  on  the  values  of  param- 
eters and  choices  of  a. 

Due  to  the  discreteness  of  the  underlying  distribution,  the  matching  concept  can- 
not be  justified  rigorously.  Our  general  finding  is  that  when  Jeffreys’  rule  is  applied, 
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(c)  a{<j>  = .6) 


Figure  2.1.  Transformed  Q-Q  Plots:  Posterior  Probabilities  of  0 vs  Frequentist  Prob- 
abilities of  0 under  Jeffreys’  and  Laplace’s  priors  and  samples  of  size  10.  (a)  trans- 
formed Q-Q  plot  for  0 = .2;  (b)  transformed  Q-Q  plot  for  <f>  = .4;  (c)  transformed 
Q-Q  plot  for  0 = .6;  (d)  transformed  Q-Q  plot  for  0 = .8. 
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(c)  a (4>=  .6) 


Figure  2.2.  Transformed  Q-Q  Plots:  Posterior  Probabilities  of  0 vs  Frequentist  Prob- 
abilities of  0 under  Jeffreys’  and  Laplace’s  priors  and  samples  of  size  50.  (a)  trans- 
formed Q-Q  plot  for  0 = .2;  (b)  transformed  Q-Q  plot  for  0 = .4;  (c)  transformed 
Q-Q  plot  for  0 = .6;  (d)  transformed  Q-Q  plot  for  0 = .8. 
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Figure  2.3.  Transformed  Q-Q  Plots:  Posterior  Probabilities  of  0 vs  Frequentist  Prob- 
abilities of  <f>  under  Jeffreys’  and  Laplace’s  priors  and  samples  of  size  100.  (a)  trans- 
formed Q-Q  plot  for  4>  — -2;  (b)  transformed  Q-Q  plot  for  (j)  = .4;  (c)  transformed 
Q-Q  plot  for  (f>  = .6;  (d)  transformed  Q-Q  plot  for  (j)  = .8. 
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Figure  2.4.  Transformed  Q-Q  Plots:  Posterior  Probabilities  of  p vs  Frequentist  Prob- 
abilities of  p under  Jeffreys’  and  Laplace’s  priors,  and  samples  of  size  10.  (a)  trans- 
formed Q-Q  plot  for  p — .125;  (b)  transformed  Q-Q  plot  for  p = .33;  (c)  transformed 
Q-Q  plot  for  p = .5;  (d)  transformed  Q-Q  plot  for  p = .75. 
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Figure  2.5.  Transformed  Q-Q  Plots:  Posterior  Probabilities  of  p vs  Frequentist  Prob- 
abilities of  p under  Jeffreys’  and  Laplace’s  priors,  and  samples  of  size  50.  (a)  trans- 
formed Q-Q  plot  for  p = .125;  (b)  transformed  Q-Q  plot  for  p = .33;  (c)  transformed 
Q-Q  plot  for  p = .5;  (d)  transformed  Q-Q  plot  for  p = .75. 
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Figure  2.6.  Transformed  Q-Q  Plots:  Posterior  Probabilities  of  p vs  Frequentist  Prob- 
abilities of  p under  Jeffreys’  and  Laplace’s  priors,  and  samples  of  size  100.  (a)  trans- 
formed Q-Q  plot  for  p = .125;  (b)  transformed  Q-Q  plot  for  p = .33;  (c)  transformed 
Q-Q  plot  for  p — .5;  (d)  transformed  Q-Q  plot  for  p = .75. 
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either  on  a regular  basis,  or  in  the  two-stage  specification,  one  usually  obtains  a rel- 
atively better  matching  property  in  comparison  with  Laplace’s  rule.  If  one  formally 
extends  the  matching  concept  from  the  continuous  case  to  the  discrete  case,  then  7 tj 
and  7 r ju  are  indeed  second  order  matching  priors. 

Tail  posterior  percentiles,  associated  with  a = .05  and  a = .95,  are  fairly  close 
to  their  frequentist  counterparts,  which  are  robust  to  parameter  values  and  the  non- 
informative  priors  that  are  selected.  This  suggests  that  one  could  rely  on  the  90% 
Bayesian  credible  set  to  achieve  the  corresponding  frequentist  coverage  probabilities 
for  p and  0. 

It  appears  that  7 vj  and  7Tj/  (for  the  Marshall-Olkin  BVE),  or  nju  and  ttuu  (for 
ACBVE’s)  have  more  similar  converge  probabilities  for  parameters,  p and  0,  in  the 
central  part  (p  = .33,  .5,  0 — -4,  .6)  than  those  near  the  boundary  (p  = .125,  .75, 
0=-2,  .8). 


2.7  Identifiable  ACBVE  via  Informative  Priors 


The  Bayesian  posterior  distribution  for  <fi  (or  0)  varies  among  different  ACBVE 
assumptions.  For  instance,  the  posterior  distribution  of  0 is  exactly  the  same  as  the 
prior  when  the  Gumbel-B  ACBVE  is  assumed.  For  other  ACBVE’s,  it  is  a weighted 
result  of  prior  and  the  likelihood  function.  The  Bayesian  estimator  of  <0  is  uniquely 
determined  and  essentially  is  the  midpoint  of  R( A,  p)  when  the  noninformative  prior 
is  assumed.  The  prior  information  of  0 may  occasionally  be  reflected  from  history 
(e.g.,  medical  data,  social  science  census).  The  strength  of  Bayesian  methodology  in 
resolving  the  nonidentifiability  allows  one  to  adopt  the  prior  information  to  provide 
reasonable  inference.  To  see  this,  we  conduct  a simulation  study  for  the  Block-Basu 
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ACBVE,  where  0 follows  a conditional  beta  prior  which  is  extended  from  the  first- 
stage  uniform  prior  5(1,1) 

»(#|A)  « (()  (l  - 0 • (2.42) 

Without  loss  of  generality,  we  let  A = 1.  Priors  5(4,16),  5(20,80),  5(10,10), 
5(50, 50),  5(14, 6),  and  5(70, 30)  are  assumed.  Since  the  posterior  of  0 here  is  inde- 
pendent of  p and  5 , the  samples  of  sizes  10  and  50  are  generated  from  T G(l,l). 
The  posterior  means  and  standard  deviations  of  4>  are  summarized  in  Table  2.8. 


Table  2.8.  Posterior  Estimates  for  <f>  under  three-parameter  ACBVE’s  and  Beta  Priors 


tt(</>|A) 

EM 

EMVM'Hm) 

n = 10 

n — 50 

5(4,16) 

.2 

.2183  (.1213) 

.2043  (.0945) 

5(20,  80) 

.2 

.2278  (.0862) 

.2042  (.0502) 

5(10,10) 

.5 

.5550  (.2098) 

.5088  (.1330) 

5(50,50) 

.5 

.5543  (T782) 

.5086  (.0882) 

5(14,6) 

.7 

.8011  (.2480) 

.7142  (.1413) 

5(70,30) 

.7 

.8146  (.2281) 

.7066  (T076) 

Figure  2.7  provides  a more  detailed  insight  of  the  simulation  results.  In  each  of 
the  sub-plots,  a specified  prior  distribution  of  (j>  is  drawn  along  with  the  associated 
posterior  distributions  of  4>  given  samples  of  sizes  10  and  50.  These  graphs  suggest 
that  priors  eventually  dominated  posteriors  as  sample  size  increases.  Bayesians  guide 
the  inference  towards  the  truth  even  when  the  likelihood  is  not  fully  identified. 
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Figure  2.7.  Beta  Prior  Distributions  and  the  Associated  Posterior  Distributions  of 
0 under  n = 10  and  n — 50:  (a)  prior:  5(4,16);  (b)  prior:  5(20,80);  (c)  prior: 
5(10,10);  (d)  prior:  5(50,50);  (e)  prior:  5(14,6);  (f)  prior:  5(70,30). 


CHAPTER  3 

BAYESIAN  ANALYSIS  OF  GENERALIZED  BIVARIATE  EXPONENTIAL  MODELS 

3.1  Introduction 


In  lifetime  analyses,  there  are  circumstances  where  auxiliary  information  (covari- 
ates), such  as  external  forces  or  characteristics  of  individuals,  may  have  impacts  on 
lifetime.  In  this  chapter,  a class  of  generalized  BVE  models  is  introduced,  in  which 
(T,  A,'L)  is  adjusted  with  covariate  information.  To  distinguish,  we  shall,  from  now 
on,  refer  to  the  BVE  models  (excluding  covariates)  as  baseline  BVE  models.  Es- 
sentially, the  generalized  linear  model  technique  is  used  for  the  construction  of  BVE 
models  with  covariates.  This  technique  is  applied  to  the  exponential  variate,  T,  and 
binary  variates,  A and  T.  Along  with  the  choice  of  the  canonical  link  in  GLM,  the 
generalized  BVE  models  are  obtained  as  combinations  of  log-linear  models  and  logit- 
linear  models.  In  the  terminology  of  survival  analysis,  these  are  Cox’s  proportional 
hazard  models  and  proportional  odds  models.  Unlike  the  baseline  BVE  models,  fea- 
sible inclusion  of  covariates  allows  for  dependence  between  T and  (A,  4/)  by  adding 
in  common  factors  shared  between  the  two.  Random  censoring  is  assumed.  Denoting 
the  censoring  variable  by  C,  all  we  observe  is  min (T,C)  besides  (A,  'L).  Bayesian 
methodology  with  noninformative  priors  is  employed  for  statistical  analyses  based  on 
the  marginal  likelihood  function  associated  with  (T,  A,^).  Laplace’s  flat  prior  and 
Jeffreys’  prior,  each  suitably  modified  by  the  stagewise  specification,  are  selected  as 
candidates.  The  organization  of  this  chapter  is  as  follows.  The  generalized  BVE 
models  are  constructed  in  Section  3.2.  In  Sections  3.3  and  3.4,  prior  densities  and  the 
associated  posterior  densities  are  derived  for  ACBVE’s  and  the  Marshall-Olkin  BVE, 
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respectively.  The  propriety  of  posteriors  is  examined  under  the  criteria  of  Ibriham 
and  Laud  (1991)  and  Shao  and  Chen  (1998).  Section  3.5  discusses  the  strength  of 
modified  noninformative  priors  in  the  context  of  factorial  experiment  design.  In  Sec- 
tion 3.6,  the  Gibbs  sampling  and  Metropolis-Hasting  sampling  methods  are  used  for 
the  generation  of  marginal  posteriors.  We  illustrate  our  findings  with  a lung  cancer 
clinical  trial  dataset  in  Section  3.7. 

3.2  Generalized  BYE  Models 


In  the  general  setting,  each  observation  (T,  A,  \P)  is  now  concatenated  with  a 
covariate  vector  and  a censoring  indicator  lc  assumed  to  be  independent  of  (T,  A,  'L). 
Let  z,  (fc  x 1)  denote  the  covariate  vector  for  the  ith  observation,  and  write  Z = 
[zi,  • • • , zn+\  for  the  n+xk  design  matrix  and  Zn  = [zi,  • • • , zn\  which  will  be  assumed 
of  rank  k throughout.  Instead  of  an  explicit  use  of  a censoring  variable,  we  distinguish 
censored  observations  from  failures  via  the  subscripts.  Specifically,  indices,  1,  • • -,n, 
will  mark  the  failures,  while  indices,  n+1,  •••,«+,  will  mark  censored  individuals.  The 
data  for  ith  individual  are  now  expressed  by  the  quadruplets  dj  = (£*,  Si,  ipf,  z,).  Let 
d = {d,  : 1 < i < n. |_}.  The  contribution  to  the  likelihood  from  the  censoring  variable 
C does  not  depend  on  the  parameters  associated  with  (T,  A,\k),  which  is  given  by 

n+ 

n P{Ti  > U ).  Thus,  the  marginal  likelihood  function  of  (T,  A,'!')  is  validly  used 

i=n+ 1 

for  statistical  inference,  including  in  the  derivation  of  noninformative  priors. 

In  this  section,  we  restrict  ourselves  to  the  Marshall-Olkin  BVE.  The  generalized 
linear  model  (GLM;  cf.  McCullagh  and  Nelder,  1996)  technique  relates  the  mean  of 
outcome  variables  of  the  regular  exponential  family  to  covariates  through  the  “link 
function” . This  technique  is  appropriately  extended  to  the  model  generalization  of  T, 
A,  and  T.  One  common  choice  of  the  link  function  is  the  canonical  link.  The  model 
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structures  are  given  by 


log(A(z)) 

= Z0, 

(3.1) 

logit(p(z)) 

= z 'll 

(3.2) 

logit(^(z)) 

r 

= z a, 

(3.3) 

where  z = (1,  Z2,  • • • , z*)' . Baseline  BVE  models  can  be  viewed  as  GLM’s  when  the 
mean  of  the  response  does  not  depend  on  covariates,  i.e.,  A = e^1,  p = e7l/l  + e71, 
and  <j)  — eai /I  + eai.  Physical  interpretations  for  regression  coefficients  (3j  s,  7/s,  and 
oij  ’s  may  vary  from  one  text  to  another. 

In  view  of  the  nature  of  the  parameters  A,  p,  and  </>  in  the  BVE,  one  may  adjust 
these  parameters  with  covariates  via  hierarchical  modeling.  Let  A(z),  p(z),  and  <f>{ z) 
denote  the  covariate- adjusted  functions  for  A,  p,  and  0,  respectively.  As  given  in 
(3.1)-(3.3),  both  Cox’s  proportional  hazard  model  and  the  logistic  regression  model 
are  adopted  in  the  competing  risks  context  for  modeling  A(z)  and  p( z)  (or  </>(z)), 
respectively. 

3.3  Generalization  of  ACBVE  Models 


Based  on  the  GLM  technique  with  the  canonical  link,  the  class  of  generalized 
ACBVE  models  (cf.  Section  2.4)  are  generated  by  assuming  (3.1)  and  (3.2).  Parame- 
ters 4>  appear  in  the  same  way  as  in  the  baseline  model.  Then  the  marginal  likelihood 
function  for  realization  {(i;, 5*; Zj)  : 1 < i < n+}  is  given  by 


L(/3,7,0|d)  oc  H 

i= 1 


1|0sR(/3.7)]' 


(3.4) 
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3.3.1  Noninformative  Priors  and  Posteriors 

To  find  noninformative  priors  on  the  basis  of  (3.4),  we  employ  the  stagewise  elicita- 
tion as  the  Fisher  information  matrix  can  be  not  fully  specified.  At  the  first  stage,  we 
propose  Laplace’s  prior,  i.e.,  7 = 77^  (<£|/3, 7)  oc  1 for  <fi  e R(f3, 7),  some  specified 
compact  set.  Thus,  the  integrated  likelihood  function  Li((3,y\d)  is  given  by 


At  the  second  stage,  Laplace’s  prior,  7 rj/2  = 7Tt/2(/3, 7)  a 1,  and  Jeffreys’  prior, 
7 tj2  = 7r./2(/3, 7),  are  computed  from  (3.5).  Suppose  that  7 tuu  = 7h/i/(/3, 7,  </>) 

7TC/!  (0|/3, 7)7rc/2 (/3, 7)  is  specified.  The  corresponding  joint  posterior  density,  denoted 
by  ttuu(P,1,  0|d),  can  be  factored  as  irui(<l>\P,y)nuu{P\d)nuu(.'ir\d),  where 


7|d) 

J L((3, 7,  </)|d)7r[/1  (0|/3, 7)  d<f> 


<x 


(3.5) 


(3.6) 


and 


(3.7) 


Next,  the  Fisher  information  matrix  derived  from  (3.5)  is  given  by 


where  M(7)  is  a n x n diagonal  matrix  with  the  ith  diagonal  element  given  by 
/ / 

Mj( 7)  = ezi '/( l + ezi^)2.  Thus,  Jeffreys’  prior  is  obtained  as  7 rj2  = 7r (/3, 7)  oc 
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[|Z'Z||ZBM(7)Zn|]a.  And  the  posterior  under  7rJ{/  = Trju(p,j,  0)  oc  'KUl{cf>\P,/y)TTj2{P,j) 
is  given  by 


KjuiP,  7,  0|d)  oc  7 TVl  (01/3, 7)7rc/c/(/3|d)7rJt/(7|d), 
where  nuu(P |d)  is  given  in  (3.6),  and 

^(7|d)  a n 7 |Z;M(7)Zb|*. 

<= i V1  + e i7  J 


(3.8) 


(3.9) 


To  compute  the  determinant,  |Z/nM(7)Z„|,  one  may  refer  to  the  following  result 
attributed  to  Cauchy  and  Binet  (cf.  Noble  and  James,  1977). 

Cauchy-Binet  Result  Let  A = [ai,  ■ • • , a„]'  be  a n x k (k  < n)  matrix  of  rank 
k and  M*  be  a n x n diagonal  matrix  with  the  ith  diagonal  element  M* . Denote 
c(ail,---, a*J  = |[ail,---,aj|2,  for  1 < ix  < • • • < ik  < n,  and  P^k)  = {(*i,  — , t*)  : 
1 < ii  < ■ • ■ < ik  < n,  c(ah,  • • - ,a ik)  > 0}.  Then 


|A'M*A|  = j:c(a.1,...,ail)(nO. 

□(<=)  u=l 


(3.10) 


By  (3.10), 


|z;m(7)z„|  = 


Y.  c(z„,--',z,J(n 

p(fc)  11  = 1 


(3.11) 


Thus, 


T!juh\<i)  OC  II 


(,(e’:7;)  I 

f 

c(zii  > ' 

•>Zu)(II  Miu( 7)) 

\1  + e J 

p(^) 

11=  1 

> 

(3.12) 


3.3.2  Propriety  of  Posteriors 


The  immediate  task  is  to  ascertain  the  propriety  of  the  posterior  densities  (3.4) 
and  (3.8).  Since  77^(01/3,7)  is  proper  for  any  given  (/3, 7),  we  may  proceed  directly 
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to  the  marginal  posterior  densities 

Kuvifl, 7|d)  = 7rt/t/(/3|d)7r[/t/(7|d),  (3.13) 

and 

TTju(f3,y\d)  = iruu({3\d)irju(/y\d).  (3.14) 

The  convergence  of  / nuu((3\d)  d(3  follows  from  a theorem  of  Ibrahim  and  Laud 
(1991),  which  provides  sufficient  conditions  for  the  propriety  of  Jeffreys’  prior.  For 
completeness,  the  theorem  is  stated  below. 

Theorem  (Ibrahim  and  Laud,  1991)  Suppose  Y\,  • • •,  Yn  are  independent  variates 
with  pdf’s 

f(yl\dt,ujl,(j))  = expiui/rjiyiOi  - b(9i))  + g(yi,y)}  (3.15) 

where  Oi  = 9( z)/3)  and  rj  and  • • • , u)n  are  known  constants.  If  the  pdf’s  are  bounded 
functions  of  the  9i,s  and  Zn  = [zi,  ■ • • , zn]'  is  of  full  column  rank,  then  under  Jeffreys’ 
prior,  a sufficient  condition  for  the  posterior  distribution  of  (3  to  be  proper  is  that 

Js  exp {(w/v)(v9-b(0)}  j d9  < oo,  (3.16) 

where  Sg  denotes  for  the  integral  domain  of  9. 

To  prove  the  finiteness  of  / 7ruu(P\d)  d(3  analytically,  we  begin  with  the  inequality 
nuu(0\d)  < IJ  (eZi@  exp(-eziPti)  \ . (3.17) 

i= 1 ' ' 

One  may  note  that  the  right-hand  side  of  (3.17)  corresponds  to  (3.15)  via 

• f(U)  = exp  (— ezA*  + z-/3), 

» rj  = ui  = = un  = 1, 

• 9{  = — exp(z)/3),  and 
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• b(9i)  = - log (-9i)  and  Sg  = (-00, 0). 

Since  Jeffreys’  prior,  proportional  to  |Z^Z„|^,  is  a constant,  one  can  verify  (3.16) 
from 


/0  1 r 00 

exp(te  + \og(-9)}(l/92)-2  d6=  / e~te  d6  = t~l 

-00  J 0 


< OO. 


The  next  theorem  examines  the  propriety  of 
Theorem  3.1  Let  {Aj  : 1 < i < n)  be  independent  binary  outcomes  with  P( A*  = 
l|z i)  = ez*7/(l  + eziT)  = 1 — P(Aj  = 0|zj).  Suppose  that  Jeffreys’  prior  is  assumed, 
which  is  given  by  717  ( 7)  oc  |Z)jM(7)Zn|  2 , where  M(7)  is  defined  in  Section  3.3.1. 
Then  717(7)  is  a proper  density  if  the  matrix  Zn  is  of  full  column  rank.  Therefore, 
the  associated  posterior  density,  equivalent  to  (3.12),  is  a proper  posterior,  i.e., 

J T/[/(7|d)  dy  < 00. 

Proof  We  shall  show  that  Jeffreys’  prior  in  this  case  is  proper,  which  implies  the 
propriety  of  the  corresponding  posterior. 

Apply  Jensen’s  inequality  to  717(7).  Then 


£ 


c z 


U 5 


■»z<j  n 


ez>uT 


(*) 


=1  V(l  + ez^7)2 


< Etc(5 


'ii  ? 


3(fc) 


n 

L n=l 


(i^l )±\\ 
Vi+eoyj 


Next,  we  examine  the  finiteness  of  integrals  of  the  type 


(ez=uT)2 


’(zii,-",ziik)]*/ n e u ,*  ~ ) ^7 

u=i  V 1 + e / 


(3.18) 


By  change-of- variables  ru  = z'  7 and  = eru/2,  then 


[^(z*i ) ■ ■ ■ , Zjfc)]  * / n - 
J u=l  V 1 


7)| 


e *«  ' )2 

+ ez^7 


c?7 


(3.19) 
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- *(£!:>.) 

(3.20) 

■ .?,(/.  „7) 

(3.21) 

< OO. 

The  result  follows. 

We  shall  then  appeal  to  the  result  of  Shao  and  Chen  (1998)  for  verifying  the 
propriety  of  nuu(j\d). 

Theorem  (Shao  and  Chen,  1998)  Suppose  that  Yi,---,Yn  are  independent  binary 
response  variables  with  pdf’s  P(Yi  — 1)  = F( z'^)  = 1 — P(Yi  — 0),  where  F(.)  is 
some  probability  distribution  function.  Define  U*  = Diag[u*,  •••,«*]  and  Z*  = Z^U*, 
where  yi  is  the  realization  of  F*  and  u*  — (— \)Vi.  Then  given  Laplace’s  prior,  the 

n , , 

joint  posterior  of  7 is  given  by  f]  {[F(z i7)]2/i[l  — F( Zj7)r  w}-  This  posterior  density 

i=  1 

is  proper  if  and  only  if: 

(Cl)  Z„  is  of  full  rank, 

(C2)  there  exists  a positive  vector  v £ Rn,  i.e.,  each  component  > 0,  such  that 
[Z*]'v  = 0,  and 

OO 

(C3)  / \y\k  dF(y)  < 00. 

— OO 

Furthermore,  (C3)  holds  for  logit,  probit,  complementary  log  and  certain  t-links. 

Thus,  if  in  addition  to  (Cl),  (C2)  is  assumed  with  u*  — (—1)^  and  Z*  = Z'nU *, 
then  convergence  of  / 7Tt/{7-(7|d)  follows  since  F is  the  logistic  distribution  in  the 
present  case.  In  summary,  to  ensure  the  propriety  of  posteriors,  one  needs  to  assume 
(Cl)  when  the  prior  is  itju,  and  both  (Cl)  and  (C2)  when  the  prior  is  7r vu. 
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3.4  Generalization  of  the  Marshall-Olkin  BYE  Model 


In  order  to  include  covariate  information  in  the  Marshall-Olkin  BVE  model,  we 
adopt  the  modeling  schemes,  (3.1)-(3.3),  and  the  marginal  likelihood  approach  de- 
scribed in  Section  3.2.  Given  : 1 < i < n+},  the  marginal  likelihood 

function  is  given  by 


L(/3,7,a|d)  a jj 


i= 1 


■>Pi 


z'/3  ( (ez-7)^  ) [ (ez»a)^ 


1 + ez*7 


1 + e: 


z O' 


> (exp(-ez>^ 

i=i 


(3.22) 


3.4.1  Noninformative  Priors 


Typical  noninformative  priors  are  computed  based  on  (3.22).  Consistent  with 
Section  3.3,  Laplace’s  prior  and  Jeffreys’  prior  are  regarded  as  candidates.  Under 
Laplace’s  prior,  the  joint  posterior  density  is  simply  proportional  to  (3.22),  which  is 
given  by  the  expression 

vrt/(/3, 7,  o|d)  = 7r[/(/3|d)7r[/(7|d)7r[/(a|d),  (3.23) 


where 


7T[/(/3|d)  oc 


n 

M 7|d)  OC  n 
i=  1 


n 

7Tf/(a|d)  a n 
2=1 


(ezia)^  1 

1 + ez'ia  J ‘ 


Next,  let 


(3.24) 


(3.25) 


(3.26) 


Mi(v)  = Diag[MM(v),---,Mi)n(v)] 


(3.27) 
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and 


M2(v)  = Diag[M2ii(v),  • • • , M2|7l(v)], 


where 


Mu(y) 


M2)i(v) 


/ 


(1  + ez'.v)2’ 


1 + ez'.v  ’ 


The  Fisher  information  matrix  is  given  by 


Tlj^Z  Z Ofcxfc 

Ojfcxfc  n_1Z^[Mi(7)M2(a)]Z„ 


n 


®kxk 
0 kxk 


1Z„M1(a)Z 


n 


Then  Jeffreys’  prior  is  given  by 

kj  = 7,  a)  oc  |Z^[M1(7)M2(a)]Zn|1/2|z;M1(a)Zn|1/2. 

By  the  determinant  expansion  given  in  (3.10),  one  can  rewrite  (3.31)  as 


7T7(/3,7,a)  a < £ c(zh,  • • •,  zh)  MMu(7)M2,iu(a) 

p(i)  \u=l  ; 


(fc  N 

n Mtiu(a) 

U=1  / 

rn 


. 


Accordingly,  the  posterior  density  is  given  by 


T/(/3, 7,  «|d)  = 7rt7(/3|d)7r7(7,  a|d), 


where  7T[/(/3|d)  is  the  same  as  in  (3.24),  and 


7Tj(7,a!|d)  oc  J] 

i—1 


ipi 


(ez.7)*i  \ l (ezia)^ 


1 + ez>7  / l 1 + e; 


z a 


(3.28) 

(3.29) 

(3.30) 

(3.31) 

(3.32) 

(3.33) 
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x 


x 


* c(zu  > ‘ ' ‘ I Z*fc) 

p(fc) 

rn 


< C(ZH  5 ■ ■ ■ > zik) 

p(^) 

k 1 n 


(pM.h)%(»))  • 

J 

(n»*w)  ■ • 


(3.34) 


3.4.2  Propriety  of  Posteriors 


In  order  to  check  the  propriety  of  posteriors,  one  needs  to  verify  the  convergence 
of  / 7Tj/(7|d)  c?7,  /7T[/(a|d)  da,  and  / 717(7,  a |d)  d'y  da. 

Shao  and  Chen’s  (1998)  result  is  applied  to  the  evaluation  of  the  first  two  integrals. 
Let  Z<p,  (n  1 + n2)  x k,  be  comprised  of  row  vectors  {z\  : 1 < i < n and  ipi  — !}• 
Then  / 7r^(7|d)  d'y  converges  if  Z#  is  of  rank  k and  Z^v#  = 0 for  some  positive 
vector  v® , (rai  + n2)  x 1,  where  Z%  — [Z(j,]'U*  and  u*  = (—1)^.  The  convergence  of 
f 7T[;(a|d)  da  holds  if  Zn  is  of  rank  k and  [Z*]'vn  = 0 for  some  positive  vector  vn, 
n x 1,  where  Z*  = Z'nU*  and  u*  = (— 1)^\ 

To  justify  the  propriety  of  (3.34),  we  consider  the  inequalities 

c(z»i,"*,zij  < c(zl,---,zj)  = max{c(zil,---,zifc)}, 

M2,i(a)  < 1, 


for  all  (*i,  • • • , ik)  € and  1 < i < n.  It  follows  that 


J 717  ( 7,  a|d)  c?7  da 


< c(zi >•••,<) 


E/n 

p(*0  J 44  — 1 \ 1 


* ( 


+ e 


z 7 
1 * 


^7 


x 


da, 

p(fc)  ^ u=l  V 1 + e !“  / 


(3.35) 


62 


Repeating  the  steps  (3.19)-(3.21)  in  the  proof  of  Theorem  3.1,  the  convergence  is 
attained  if  Zn  satisfies  rank  k assumption. 

Similar  conclusions  can  be  drawn  as  in  the  ACBVE  cases.  The  propriety  is  es- 
tablished under  Laplace’s  prior  if  both  (Cl)  and  (C2)  hold  in  the  way  stated  above. 
The  calculation  based  on  Jeffreys’  prior  requires  (Cl)  only. 

3.5  Application  to  a Sample  with  Categorical  Covariates 

For  many  statistical  applications,  covariates  (or  explanatory  variables)  are  cate- 
gorical taking  on  a finite  number  of  values.  In  these  applications,  the  task  of  finding 
posteriors  in  Sections  3.3  and  3.4  is  greatly  simplified.  Primarily,  we  consider  a fac- 
torial experiment  design  in  the  current  context.  Suppose  that  certain  interpretable 
parameters  such  as  the  main  effects  or  interaction  effects  among  the  factors  are  of 
interest.  In  the  following  discussion,  a variety  of  probability  matching  properties  are 
investigated  among  linear  functions  of  fa's  when  Laplace’s  or  Jeffreys’  rule  is  applied 
to  a complete  dataset. 

3.5.1  Defining  the  Design  Matrix 

In  a factorial  experiment  design,  the  underlying  sample  can  be  classified  into  k 
mutually  exclusive  categories  (all  possible  combinations  of  the  factor  levels).  The 
covariate  vector  for  individual  i,  z*  = (zy,  • • • , Zj^)'  is  defined  such  that  zy  is  the 
indicator  variable  for  category  j,  for  1 < j < k.  Thus,  the  design  matrix  Z is  a 
composition  of  the  unit  vectors. 
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3.5.2  Likelihood  Functions  and  Posteriors 


IV  IV  IV 

Let  nh  = E l[Zitj=i],  n2j  = _E(1  - Wil[*ij=i]>  = £(1  - 


i= 1 


-1 


n.  = rii  - + ri2  + n\2.,  and  tt  = E Ll[z,  =i]-  Then  one  can  simplify  the  likelihood 

i=  1 

functions  (3.4)  of  ACBVE,  and  (3.22)  of  the  Marshall-Olkin  BVE  to 

k k ( (p"fj)nij  f 

L(/3, 7, 0|d)  oc 

k 

L(0, 7,  a|d)  oc  []  {exp[n.  ./?,■  - 
j= i 


(e70nij  (ea0 


X JJ  ( (1  + e^')niJ+n2J  (1  + eQ0"  j 1 ' 


Accordingly,  the  priors  7 tju,  and  7 rj  given  in  (3.9)  and  (3.32)  reduce  to 

*L  / (eft)  a \ 

7?  4*)  oc  n 1 + e7j  I 1[0efl(^,7)]> 


^(/3,7, a)  oc  n{fe 


7=1 


(eai ) 2 


(1  + eQi) 


The  corresponding  posteriors  under  n uu,  nju,  ttu,  and  7 rj  are 


(3.37) 


(3.38) 


(3.39) 


>7/t/(/3,7,0|d)  oc  n{«P[«.,ft-e%]} 

j=i 


x n| 

7=1  1 

f (e7j  )"9 

[(1  + e^)n 

k 

<*  n{ 

expfn.^-  . 

7 = 1 

k 

x n< 

7=1 

f (e7i)"9H 

[(1  + e^)n 

(3.40) 


(3.41) 
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K 

ttu({3,  7,  a|d)  oc  [J  {exp[n.  ./?,■  - e^ttj]} 


3 = 1 

k ( („'u\n  l 


(e7j’)”1->  (eQj  ),'ij  1 "‘3 


aa^ni,.+»2. 

x TT  , 

+ (l  + eQ0 


(3.42) 


7Tj(f3, 7,  a|d)  oc  n {exp [n^j  - e^ttj]} 

3= 1 


£ ( (e'w)B1'+*  (eQj)n6+Tl2j  + 2 

,i,  1 (1  +e^)nii+™i+1  (1  +e“J)"i+5 


(3.43) 


Using  the  transformations:  {Aj  = e^'  : 1 < j < k},  {pj  — ^{77  : 1 < j < k},  and 
{(f) j = rf-nj  : 1 < j < k},  the  marginal  posterior  distributions  are  given  by 


Aj|d,  TT  JU  ( TTuu  TT  j TTu  ) 

~ G{n.pttj), 

(3.44) 

Pj  d,  TT  JU  ( TT  j ) 

~ B{n\i  + .5,  n2j  + .5), 

(3.45) 

Pj  |d,  TTUU  ( TTU  ) 

~ B{nij,n2j), 

(3.46) 

0j|d,  TT  JU  ( TTj  ) 

~ -Bfai,.  + + -5,  n12j  + 1), 

(3.47) 

0j|d,  TTuu  ( ^U  ) 

~ B(nij  +n2j,n12j). 

(3.48) 

The  posteriors  for  fy’ s,  7j’s,  and  ctj's  can  be  obtained  via  one-to-one  transforma- 
tion. 


3.5.3  Contrasts  of  Interest:  Linear  Functions  of  8/ s 


Consider  the  one-to-one  reparameterization  6 = A-1/3  for  some  nonsingular  ma- 
trix A-1.  The  contrasts  {0i,  •••,#*;}  may  particularly  correspond  to  the  ratios  of 
hazard  rates  of  adjacent  categories  in  the  logarithm  scale.  Write  A'  = [a1;-  • * , a*]. 
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Due  to  the  factored  likelihood  function,  posterior  marginals  of  0/s  are  found  as  fol- 
lows. The  marginal  likelihood  function  of  6 is  given  by 

k 

L(9  Id)  a ex  p{£K a'j9  -exp(a jO)tj]}.  (3.49) 

i= i 

The  corresponding  Fisher  information  matrix  is 

I{0)  = n~1A,[Z,Z}A. 

Hence,  nj(0)  oc  1,  and  I_1(0)  = (Iv'(0))  is  a constant  matrix  independent  of  9.  Using 
the  formulae  (1.36),  (1.38),  and  (1.40),  one  can  conclude  that  both  the  second  order 
optimal  and  HPD-based  matching  properties  for  6/ s hold  under  tcuu,  kju,  Tp/,  and 
7 rj.  The  proofs  are  deferred  to  Appendix.  In  order  to  verify  the  matching  properties 
numerically,  we  shall  demonstrate  two  simulation  results  in  Section  3.7. 

3.6  Sampling  Schemes 

In  Bayesian  framework,  the  posterior  density  tightly  anchors  inference.  As  one  has 
seen  in  Sections  3.3  and  3.4,  one  may  be  confronted  with  very  complicated  multivariate 
posterior  densities  so  that  inference  requires  high-dimensional  numerical  integration. 
A more  attractive  alternative  is  to  conduct  numerical  integration  via  the  Markov 
Chain  Monte  Carlo  (MCMC)  methods,  in  particular,  the  Gibbs  sampling  method. 
This  requires  generating  samples  from  the  conditional  distributions  of  the  parameter 
given  the  other  parameters  and  data.  Many  of  these  conditional  densities  are  non- 
standard, and  one  needs  to  employ  the  Metropolis-Hasting  algorithm  to  generate 
samples  from  these  conditionals. 

3.6.1  Gibbs  Sampling 


The  Gibbs  sampler  (cf.  Gelfand  and  Smith  (1990),  Casella  and  George  (1992), 
Dellaportas  and  Smith  (1993))  is  a Markovian  updating  scheme  enabling  one  to  obtain 
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samples  from  a joint  distribution  via  iterated  sampling  from  full  conditional  distri- 
butions. The  merit  of  this  sampling  scheme  is  that  one  can  obtain  samples  from  the 
target  marginal  density  avoiding  tedious  multi-dimensional  integration.  Suppose  that 
the  joint  density  of  (Yi,  • • • , Ym)  is  proportional  to  p(yu  • • • , ym).  Let  /x(.),  • • • , /«,(•) 
denote  the  marginal  density  for  accordingly.  Suppose  that  one  aims  to 

sample  from  /)(.).  Ideally,  it  could  be  found  by  integrating  out  the  joint  density,  say 
f(yi,  • • • , ym)  with  respect  to  all  the  arguments  but  y*.  In  practice,  it  may  not  always 
be  feasible,  particularly  for  high-dimensional  densities  known  only  up  to  a multipli- 
cator  constant.  With  Gibbs  algorithm,  one  only  needs  to  recognize  the  conditional 
densities  (up  to  the  normalized  constant) 

PiiVilVi,  • • • , Vi-i,  Vi+i,  • • • , Vm)  oc  p(yu  • • • , ym), 

for  1 < i < m.  Set  initial  vector  (y[0\  • • • ,y$).  This  Markovian  updating  procedure 
is  stated  below. 

Generate  yj1'  from  the  conditional  density 

pi(yil2/20)>---,2/m)/  J ,ym)dyi- 

Next,  generate  y^  from  the  conditional  pdf  P2(y2\y[1\  2/3°\  • • • , y$)-  Proceeding  this 
way,  we  complete  eventually  the  first  iteration.  In  the  (j  + l)th  iteration,  = 

(yi\  • • • > Vm)  is  replaced  for  the  initial  vector  for  proceeding  y^+1\  Under  mild 
conditions,  as  shown  by  Geman  and  Geman  (1984)  for  discrete  distributions,  we  have 

y[J)  ~ Mvi), 

for  fairly  large  J.  However,  quite  often  some  conditionals  Pi(yi\y\-i\)  cannot  be  recog- 
nized as  any  standard  density.  In  order  to  generate  samples  from  these  non-standard 
pdf’s,  one  employs  the  Metropolis-Hasting  algorithm. 
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3.6.2  Metropolis-Hasting  Algorithm 

In  connection  with  Gibbs  sampling,  the  Metropolis-Hasting  algorithm  (cf.  Chib 
and  Greenberg,  1995)  becomes  useful  when  some  of  the  conditional  pdf’s  are  not 
standard,  and  are  known  up  to  a multiplicative  constant.  For  instance,  f(y*\y)  oc 
p(y*).  To  generate  samples  for  Y from  this  pdf,  one  needs  to  specify  a candidate 
generating  density , denoted  by  g(x,y)  with  f g(x,y)dx  — 1.  Define  the  probability  of 
transition  as 


• Generate  y*  from  g(.,y{ and  u from  1/(0, 1). 

• If  u < a(y(°\y*),  then  set  y^  = y*,  else  update  y^  by  y*. 

• Repeat  the  previous  two  steps. 

In  this  way,  we  generate  samples  y^\  y^2\  • • ■,  from  the  marginal  pdf  of  Y.  Typi- 
cally, the  candidate  generating  density  is  a standard  distribution.  Among  those  well- 
developed  selection  tools,  we  shall  employ  the  one  suggested  by  Chib  and  Greenberg 
(1992).  The  scheme  requires  p(y)  oc  c(y)m(y),  where  c(y)  is  uniformly  bounded,  and 
m{y)  is  a density  of  standard  form.  Then  set  g(y*,  y)  = m(y*)  to  draw  candidate 
samples.  Accordingly,  a(y,y*)  reduces  to  min  l}- 


Initialize  with  y = y(°\  The  algorithm  is  described  as  follows. 


3.7  Illustrated  Examples 


3.7.1  Simulation  Studies 


Two  simulations  are  conducted  for  k = 2 and  k — 3 to  verify  the  matching 
properties  stated  in  Section  3.5.  The  procedure  is  described  below.  First,  the  model 
parameters  /3  and  matrix  A are  selected  in  advance.  Gibbs  sampling  scheme  is  used 
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to  generate  posterior  samples  for  Qi,---,Qk,  respectively.  The  simulation  study  is 
based  on  5000  Markovian  iterations  with  1000  samples  generated  in  each  iteration. 
In  each  iteration,  5%,  25%,  50%,  75%,  and  95%  posterior  percentiles  for  Qj' s are 
evaluated.  For  each  Qj,  the  empirical  frequentist  distributions  are  evaluated  at  the 
five  posterior  percentiles.  Given  a specific  percentage  a,  samples  d,  and  prior  7r,  the 
frequentist  probability  P(9i  < 0,-^(d)|0)  is  estimated  by  computing  the  proportion 
of  these  probabilities  among  the  5000  iterations,  which  is  observed  to  be  less  than  a. 
Example  1 For  k — 2,  we  let  (3  = (Pi,  p2)'  = (1, 0.5)'  and  select  a number  of  (n  n n. 2). 
Suppose  the  contrasts  of  interest  are  Qi  = Pi  — P2,  and  92  = P2  so  that 


Here 


e = 


i -l 

o 1 


P- 


A"1 


1 -1 

0 1 


Table  3.1  presents  the  results  of  the  lower  tail  frequentist  coverage  probabilities  of 
Qi  at  0-^(d)’s,  where  7r  = 7 tjv,  iruu,  kj,  ^u,  a = -05,  .25,  .5,  .75,  .95,  and  d are  the 
simulated  data. 


Table  3.1.  Frequentist  Tail  Probabilities  of  Q i and  02  under  nju,  t^uu,  tt j , or  nu,  and 
(PuP2)  = ( 1,0.5) 


(«.i .«.») 

a 

.05 

.25 

.50 

.75 

.95 

(2,2) 

6 1 
^2 

0.0480 

0.0526 

0.2498 

0.2496 

0.4948 

0.4936 

0.7428 

0.7536 

0.9470 

0.9482 

(2,4) 

0i 

@2 

0.0490 

0.0540 

0.2446 

0.2552 

0.4944 

0.5026 

0.7524 

0.7544 

0.9502 

0.9524 

(5,5) 

0i 

0.0514 

0.0534 

0.2444 

0.2586 

0.4960 

0.5012 

0.7488 

0.7504 

0.9462 

0.9506 

(10,10) 

9 1 
^2 

0.0528 

0.0500 

0.2422 

0.2560 

0.4904 

0.4986 

0.7399 

0.7446 

0.9506 

0.9494 

Example  2 For  k = 3,  we  let  (3  = (/?i , /?2,  /33)'  = (1,0.5, 2)'  and  select  a number  of 
(n  15 n 2, n3).  The  contrasts  of  interest  are  Q\  = Pi  — p2,  d2  — P2  — Pi,  and  93  = Pi  so 
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that 


0 = 


1 -1  0 

0 1 -1 

0 0 1 


0. 


Similar  to  Example  1,  a variety  of  (n,1,n_2,n.3)  are  selected  to  examine  the  matching 
property.  The  frequentist  coverage  probabilities  of  Of  s are  calculated  in  Table  3.2. 


Table  3.2.  Frequentist  Tail  Probabilities  of  9\,  92  and  93  under  ttju,  7r uu,  ttj , or  i ru, 
and  = (1,0.5, 2) 


(n.1,n.2,n_3) 

a 

.05 

.25 

.50 

.75 

.95 

0i 

0.0520 

0.2602 

0.5060 

0.7560 

0.9504 

(2,2,2) 

02 

0.0448 

0.2434 

0.4886 

0.7370 

0.9452 

03 

0.0532 

0.2494 

0.5158 

0.7636 

0.9542 

01 

0.0478 

0.2488 

0.4938 

0.7390 

0.9486 

(8,5,2) 

02 

0.0526 

0.2464 

0.4882 

0.7638 

0.9500 

03 

0.0492 

0.2588 

0.5180 

0.7462 

0.9518 

01 

0.0470 

0.2550 

0.5080 

0.7570 

0.9524 

(10,10,10) 

02 

0.0482 

0.2508 

0.4930 

0.7608 

0.9464 

03 

0.0510 

0.2400 

0.4994 

0.7442 

0.9514 

3.7.2  Data  Analysis 

Next,  we  illustrate  Bayesian  procedure  for  generalized  ACBVE’s  with  a lung  can- 
cer clinical  trial  that  was  conducted  by  the  Eastern  Cooperative  Oncology  Group 
(ECOG).  In  the  ECOG  study,  194  patients  with  squamous  cell  carcinoma  were  in- 
cluded, of  which  83  patients  failed  with  local  spread  of  disease  (cause  1),  44  failed 
with  metastatic  spread  of  disease,  while  the  remaining  67  obsercations  were  cen- 
sored. Three  covariates  are  considered:  X\  = performance  status  (ambulatory=0, 
non-ambulatory=l),  X2  = treatment  {A  = 0,  B = 1),  and  AN  = age  in  years. 
Frequentist  analysis  have  been  performed  by  Lagakos  (1978)  who  assumed  a cause- 
specific  model,  and  by  Wada,  Sen  and  Shimakara  (1995),  who  assumed  the  generalized 
Sarkar  ACBVE.  In  the  following  discussion,  Bayesian  analyses  under  noninformative 
priors  are  provided  for  this  study,  in  which  covariate  effects  on  survival  are  of  primary 
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interest.  The  first  analysis  (Til)  was  conducted  by  including  all  three  covariates  given 
Laplace’s  uniform  prior.  In  this  case,  Jeffreys’  prior  becomes  computationally  prob- 
lematic. In  the  second  analysis  (A2),  covariate  X3  (age)  is  excluded  since  its  effect  is 
found  to  be  insignificant.  In  this  case,  both  Jeffreys’  prior  and  Laplace’s  prior  can  be 
used. 

3.7.2. 1 Analysis  1 

In  (Al),  the  covariate  vector  is  given  by  (1,  Yi,X2, X^)',  which  is  associated 
with  coefficients  (3  = (/A, fa, /?3, fa)'  in  the  proportional  hazard  model,  and  7 = 
(71, 72,  73, 7i)'  in  the  logistic  regression  model,  respectively.  M-H  algorithm  within 
Gibbs  sampling  is  adopted  to  access  posterior  information.  To  verify  the  convergence 
of  the  Gibbs  sampler,  the  potential  scale  reduction  factor,  denoted  by  R,  proposed  by 
Gelmen  and  Rubin  (Gelman  and  Rubin,  1992)  is  computed.  R— values  for  fa,  fa,  fa, 
fa,  7i,  72  , 73,  and  74,  based  on  posterior  samples  generated  from  five  distinct  initial 
points,  are  given  by  1.0068,  1.0002,  1.0003,  1.0074,  1.0064,  1.0005,  1.0020,  and  1.0069, 
respectively.  This  confirms,  to  a certain  extent,  the  stability  of  our  Gibbs  sampling 
process.  This  can  also  be  revealed  from  the  marginal  posterior  distributions  given  in 
Figures  3.1  and  3.2. 

Denote  the  a posterior  quantiles  by  qa.  In  Table  3.3,  we  report  posterior  mean 
(B.E.),  standard  deviation  (S.D.),  and  quantiles  9. 025,  9.05,  9.25,  9.5,  9.75,  9.95,  and  9.975 
given  the  assumption  of  Laplace’s  prior. 

In  examining  the  goodness-of-fit  property,  we  adopt  the  method  by  Gelman,  Car- 
lin, Stern,  and  Rubin  (1995).  The  main  idea,  analogous  to  the  standard  p— value 
justification,  is  to  compare  the  observed  outcome  to  its  posterior  predictive  distribu- 
tion, which  is  given  by  py  = P(Ypred  > yobs\0Post),  where  Ypred  denotes  the  posterior 
predicted  variable  of  observation  y0bs,  and  is  assumed  to  be  continuous.  It  reflects 
lack-of-fit  if  this  upper  tail  probability  is  close  to  0 or  1.  In  practice,  py  is  computed 
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-0.03  -0.02  -0.01  0.0  0.01 

(d)  Pi  (R  = 1.0068) 


0.02 


Figure  3.1.  Posterior  Densities  of  Pi,  p2,  Pz,  and  p4  generated  from  five  distinct 
initial  points  via  Gibbs-Metropolis  sampling  scheme  and  the  associated  potential  scale 
reduction  factors,  R’s:  (a)  posterior  densities  of  Pi,  (b)  posterior  densities  of  /?2;  (c) 
posterior  densities  of  /?3;  (d)  posterior  densities  of  P4. 
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(a)  7i  (R  = 1.0069) 


(b)  72  (R  = 1.0005) 


(c)  73  (R  - 1.0020) 


Figure  3.2.  Posterior  Densities  of  71,  72,  73,  and  74  generated  from  five  distinct 
initial  points  via  Gibbs-Metropolis  sampling  scheme  and  the  associated  potential  scale 
reduction  factors,  R's:  (a)  posterior  densities  of  71;  (b)  posterior  densities  of  72;  (c) 
posterior  densities  of  73;  (d)  posterior  densities  of  74. 
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Table  3.3.  ECOG  Posterior  Estimates  under  Main  Effect  Model  and  7 xuu 


8 

Ih 

& 

Pz 

Pi 

7i 

72 

73 

74 

B.E. 

-3.353 

.451 

.328 

-.0071 

2.550 

.057 

-.035 

-.032 

S.D. 

.513 

.163 

.193 

.0084 

1.097 

.341 

.422 

.018 

<7.025 

-4.296 

.133 

-.040 

-.0223 

.593 

-.533 

-.798 

-.061 

9.050 

-4.158 

.175 

.005 

-.021 

.783 

-.480 

-.717 

-.057 

9.250 

-3.743 

.336 

.189 

-.0134 

1.722 

-.192 

-.349 

-.0426 

9.500 

-3.382 

.455 

.326 

-.0068 

2.471 

.044 

-.049 

-.0301 

9.750 

-2.980 

.572 

.464 

-.0007 

3.236 

.286 

.254 

-.0182 

9.950 

-2.523 

.715 

.653 

.0063 

4.172 

.621 

.669 

-.0047 

9.975 

-2.424 

.753 

.702 

.0085 

4.455 

.730 

.777 

-.0016 

empirically  by  counting  the  proportion  of  ynew' s generated  from  posterior  samples 
0post  that  exceeds  y0bs.  More  specifically,  let  fg(.)  denote  the  pdf  of  Ypred . Sup- 
pose that  the  posterior  samples  for  6 are  given  by  {0(1\  • • • , and  y^l\  • • • , y ('m'> 

generated  respectively  from  fgw,  • • • , fg(m)  are  the  predicted  samples.  The  posterior 


predictive  distribution  is  given  by 


2 m 

Py  — \y<vU)V 
"L  j= 1 


(3.50) 


Using  this  analogy,  we  assess  the  goodness-of-fit  for  a binary  outcome  y by 

1 m 

j= l 

This  probability  should  be  interpreted  as  posterior  predicitive  matching  probability. 
Thus,  the  higher  the  probability,  the  better  the  predition  (matching). 

Two  scatter  plots  for  pti's  and  pgd s are  drawn  in  Figure  3.3,  and  the  corresponding 

n n 

empirical  predictive  coverage  probabilities,  X)  Pu/n  and  X!  Ps,/n,  are  computed  in  the 

i=i  i=i 


right-hand  corner  of  each  plot. 
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Figure  3.3.  Goodness-of-Fit  Plot  for  Failure  Observations  under  Analysis  1:  (a) 
Plotting  Bayesian  Predictive  Coverage  Probabilities  for  TVs;  (b)  Plotting  Bayesian 
Predictive  Coverage  Probabilities  for  Aj’s. 
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3. 7. 2. 2 Analysis  2 


From  a practical  viewpoint,  the  previous  analysis  suggests  that  that  “age”  fac- 
tor has  rather  insignificant  effect.  In  the  second  approach,  only  X2  (the  ambulatory 
indicator)  and  X3  (the  treatment  B indicator)  are  included.  Thus,  the  two  studied  co- 
variates now  are  categorical.  Following  the  description  in  Section  3.5,  ECOG  samples 
are  classified  into  four  covariate  combinations.  The  covariate  vector  for  individual  i is 
defined  as  z*  = (ziA,  zi)2,  zi)3,  ziA)'  such  that  ziA  = l[(x2,x3)=(i,i)b  zi, 2 = l[(x2>x3)=(i,o)]> 
Zi, 3 = l[(x2,A'3)=(o,i)b  and  ziA  = 1[(x2,a3)=(o,o)]-  Posterior  quantities,  B.E.,  S.D.,  q.025, 
9.05)  9.25  : 9.5  : 9.75  ) 9.95:  and  q, 975  for  (3j  and  7 j are  presented  in  Table  3.4.  The  Bayesian 
model  diagnosis  scatter  plots  are  given  in  Figure  3.4. 


Table  3.4.  ECOG  Posterior  Estimates  under  Interaction  Model  and  7 tju,  nuu 


6 1 r 

dl 

02 

03 

04 

7i 

72 

73 

74 

B.E.  7 tju 
S.D. 

-2.930 

0.160 

-3.577 

0.298 

-3.505 

0.127 

-3.572 

0.260 

0.509 

0.326 

1.069 

0.668 

0.696 

0.271 

0.400 

0.525 

B.E.  -kuu 
S.D. 

-2.908 

0.157 

-3.493 

0.288 

-3.488 

0.130 

-3.503 

0.256 

0.501 

0.322 

0.998 

0.620 

0.687 

0.270 

0.382 

0.514 

9.025  TTjf/ 
KUU 

-3.216 

-4.181 

-3.744 

-4.102 

-0.118 

-0.113 

-0.117 

-0.071 

0.167 

0.175 

-0.607 

-0.613 

9.05  7 Tju 

77(7(7 

-3.159 

-4.069 

-3.699 

-4.004 

-0.019 

-0.012 

0.065 

0.120 

0.250 

0.258 

-0.445 

-0.446 

9.25  7TJ(7 

77(7(7 

-2.987 

-3.744 

-3.563 

-3.717 

0.289 

0.300 

0.639 

0.726 

0.507 

0.518 

0.050 

0.066 

9.50  7Tj(7 

77(7(7 

-2.872 

-3.535 

-3.472 

-3.532 

0.506 

0.520 

1.061 

1.176 

0.689 

0.702 

0.396 

0.424 

9.75  7Tj(7 

77(7(7 

-2.761 

-3.339 

-3.383 

-3.355 

0.727 

0.744 

1.513 

1.663 

0.875 

0.889 

0.750 

0.792 

9. 95  717(7 

77(7(7 

-2.606 

-3.074 

-3.259 

-3.117 

1.055 

1.077 

2.238 

2.458 

1.150 

1.168 

1.284 

1.349 

9.975  7Tj(7 

77(7(7 

-2.557 

-2.992 

-3.220 

-3.043 

1.164 

1.188 

2.499 

2.748 

1.242 

1.466 

1.466 

1.539 

The  linear  contrasts  of  /?/ s are  considered,  particularly,  the  relative  hazard  rates 
between  categories  in  the  logarithm  scale  (or  linear  contrasts  of  /3y  ’s) . Their  Bayesian 
posterior  estimates  and  quantiles  under  ttuu  (or  7 tju)  are  presented  in  Table  3.5. 
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observation  ID 
(a)  Fit  for  T 


observation  ID 
(b)  Fit  for  A 

Figure  3.4.  Goodness-of-Fit  Plot  for  Failure  Observations  under  Analysis  2:  (a) 
Plotting  Bayesian  Predictive  Coverage  Probabilities  for  7V s;  (b)  Plotting  Bayesian 
Predictive  Coverage  Probabilities  for  Aj’s. 
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Table  3.5.  Posterior  Estimates  for  Contrasts  of  Interest  under  7 Tuu  (or  Kju) 


Contrasts 

p3  — Pi 

Pi  - P3 

Pi  — P2 

B.E. 

0.058 

0.584 

0.623 

S.D. 

0.282 

0.207 

0.320 

<7.  025 

-0.501 

0.165 

0.064 

<7.050 

-0.409 

0.234 

0.234 

<7.250 

-0.126 

0.451 

0.401 

9.500 

0.052 

0.587 

0.605 

<7.  750 

0.237 

0.720 

0.826 

<7.950 

0.543 

0.920 

1.175 

<7.975 

0.646 

0.988 

1.299 

3.7.2. 3 Summary 

• Goodness-of-fit  of  Bayesian  procedure  is  demonstrated  in  both  analyses.  Based  on 
these  justifications,  no  evidence  of  lack-of-fit  is  revealed  in  either  analysis. 

• Priors  itju  and  ttuu  used  in  Analysis  2 are  second  order  probability  matching  priors 
when  /3i,  fa,  P3,  Pi,  or  their  differences  are  the  parameters  of  interest.  For  7/s,  due 
to  the  discreteness  of  the  underlying  distribution,  the  matching  concept  cannot  be 
justified  rigorously.  However,  if  one  formally  extends  the  matching  concept  from  the 
continuous  case  to  the  discrete  case,  then  ttju  is  indeed  a second  order  probability 
matching  prior. 

• We  provide  an  example,  where  both  nonidentifiability  of  the  likelihood  and  censoring 
are  present.  The  composite  prior,  nevertheless,  provides  a satisfactory  matching 
performance  for  parameters  of  primary  interest. 


CHAPTER  4 

GEOMETRIC  COMPETING  RISKS  MODELS 
4.1  Introduction 


In  Chapters  2 and  3,  Bayesian  inference  was  considered  for  competing  risks  mod- 
els originating  out  of  various  BVE’s.  The  nonidentifiability  was  overcome  by  the 
knowledge  of  the  underlying  model  and  prior  belief.  Suppose  that  the  intrinsic  bi- 
variate lifetime  mechanism  is  not  clearly  specified.  One  shall  not  interpret  the  result 
based  on  an  ad  hoc  model  assumption.  In  this  chapter,  we  focus  attention  on  a class 
of  stochastic  models  for  (T,  T,  A),  which  does  not  require  exponential  marginals  or 
the  independence  of  T and  (4>,  A)  in  advance.  As  we  will  see  later,  one  also  avoids 
nonidentifiability  from  such  modeling  scheme.  In  particular,  we  consider  discretized 
lifetime  T.  This  occurs  for  instance  when  lifetime  T is  screened  periodically  due  to 
resource  limitation  or  due  to  economic  reason,  sometimes  even  due  to  intentional  dis- 
cretization. The  stochastic  modeling  scheme  is  stated  in  Section  4.2.  Based  on  the 
specification  of  the  stationary  survival  probabilities,  generalized  geometric  competing 
risks  models  (GEM)  are  constructed,  which  feasibly  feature  a variety  of  lifetime  phe- 
nomena, e.g.,  T has  the  piecewise  exponential  marginal.  Interestingly  enough,  this 
approach  reflects  the  mathematical  evidence  of  the  step  function  approximation  to 
the  true  survival  function.  In  Section  4.3,  Bayesian  inference  under  Laplace’s  and  Jef- 
freys’ priors  are  derived.  The  ECOG  dataset,  introduced  in  Section  4.4,  is  reanalyzed 
under  the  geometric  model  assumption. 
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4.2  Generalized  Geometric  Models  and  Likelihood  Functions 


Kaplan-Meier  estimator  or  product-limit  estimator  (Kalbfleisch  and  Prentice,  1980 
and  Cox  and  Oakes,  1984)  provides  a stochastic  modeling  for  analyzing  univariate 
survival  data.  The  survival  curve  is  estimated  by  a step  function  under  certain  as- 
sumptions. Dirichlet  process  (Ferguson,  1973)  for  modeling  the  survival  function, 
and  Levy  process  (Kalbfleisch,  1978)  or  Beta  process  (Hjort,  1990)  for  modeling  the 
underlying  hazard  function  are  some  of  the  well-known  Bayesian  nonparametric  al- 
ternatives. Clayton  (1991),  Aslanidou  and  Dey  (1996),  and  Sinha  and  Dey  (1997) 
extended  these  ideas  to  the  analysis  of  multiple  events,  in  which  case  the  lifetime 
process  is  adjusted  for  covariates.  This  is  also  found  convenient  in  the  analysis  of 
discretized  survival  data  or  interval  censoring.  We  generalize  this  idea  to  establish  a 
general  class  of  competing  risks  models,  which  can  in  particular,  feature  the  likelihood 
of  the  BVE  without  suffering  from  the  nonidentifiability. 

Let  0 = x0  < Xi  < • • • < xL+i  = oo,  where  Xi,---,xL  denote  the  designed 
time  points  for  survival  screening  or  the  intentional  discretization  of  data  in  the 
study.  Define  77  = (xi-i,xi\  for  1 < l < L.  Let  H;  = (T^l\  A^,  z)  denote 
the  lifetime  history  of  (T,A,\fr)  in  77  given  z(h  1),  where  T ^ is  the  indicator  of 
failure  in  77,  and  A®  and  record  the  status  of  A and  'I'  at  77,  respectively.  Define 
H0  = (T(0\  A(0),  4d°L  z)  = (0,  — 1,  — 1;  z).  The  three  correlated  stochastic  processes 
are  defined  as  follows. 

T(l)  = l[reTj]> 

r 0 if  # = 0,  r(‘>  = 1 

t(')  = l 1 if  $ = 1,  r<‘>  = 1 , 

[ -1  if  t®  = 0 


and 
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A(0 


0 

1 

-1 


if  A = 0,  tfW  = 1,  T®  = 1 
if  A = 1,  = 1,  T®  = 1 . 

if  (0  = T<0  = 0 


T('),  A^),  and  \f0)  essentially  comprise  three  monotone  increasing  stochastic  pro- 
cesses. Define  0°  = 1,  H;+  = (T^  = 0,  \&0),  A(0),  and  H(~~  = (T®  = 1,  A^O). 

Assume  that  P( H0)  = 1,  P(TW  = IIHjIj)  = 1,  P(A«  = -l,^')  = -1| T®  = 

0,  H+ J = 1,  P(tfW  = i|^-x)  = j.HjIj)  = 1,  and  P(A«  = jjA0-D  = = 

1, Ht-1)  = l,fori  = 0,l,  l</<  L.  The  next  table  exhibits  the  possible  transition 
routes  of  the  joint  process. 


Table  4.1.  Stochastic  Transition  Status 


H0 

H(-i 

H; 

Remark 

(o,-i, -i;z) 

(0,-1, -l;z) 

(0,-1, -l;z) 

Censored 

(0,-1, -l;z) 
(1,0, -l;z) 

(1,0, -l;z) 
(1,0, -l;z) 

Simultaneous 

Failure 

(0,-1, -l;z) 
(1,1, 0;z) 

(1,1, 0;z) 
(1,1, 0;z) 

Cause  2 
Failure 

(0,-1, -l;z) 
(1,1,  l;z) 

(1,1, l;z) 
(1,1, l;z) 

Cause  1 
Failure 

The  likelihood  of  (t,5,ip)  in  terms  of  (t^°\  V^°\  6^  (t^,  ^L\  S^)  is  given 

by 


p(  H0) 

\i= i 

= P( H0)  (n ^(t(,)|H,_1)^(0|t(,), *(0, j . (4.1) 

The  likelihood  function  is  then  determined  by  the  transition  probabilities  P(f^)|HjtL1), 
ItWjHitj),  and  P(5(|)|^(|))t(0j H^).  Let  Ft(.),  F^(.),  and  F5(.)  be  differen- 
tiable distribution  functions,  and  these  are  given  by 
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F?  = P(T«  = l|Hi_1)  = F<(z'A;ri) 

= p(t®  = i|h,_x)  = 1 


if  H/_!  = H/lj, 
if  H;_!  = H^li; 


(4.2) 


F(0  = p(tf  (U  = 1|T(0)  H,_!)  = F*(z'a«; Ti)  if  = 1,  H;_!  = H+  15 

= P(^')=i|T(,),H/-i)  = l if^-1)=j,H«_1=Hr_1;  (4-3) 

Fj°  = P(AW  = l|^),T«,Hi_1)  = P(5(z,7i;r,)  if  = T«  = 1,  = H+  1; 

= P(A{/)  = i|^(0,T(i), , H,_i)  = 1 if  A«  = j,  H(_!  = Hr_i;  (4.4) 


for  j = 0, 1,  1 < l < L.  Let  F — 1 — F.  The  probabilities  of  nonsingularity  and  cause 
1 failure  by  the  end  of  the  study  are  respectively  given  by 

/>(*  = lp-w  = i)  = £{<  n npdX’t  <4-5) 

i=i  o <l'<l 

P(A  = 1| r(1>  = 1)  = £{(  II  F^F^FfFf}.  (4.6) 

/=!  0 <l'<l 


Let  (31  = (/3j,  • • • , (3'L),  a'  = (a'x,  • • • , a'L),  i = (7i,  • ■ • ,7l),  and  1,  = l[o<p(#=i)<i]. 
The  joint  likelihood  function  becomes 


L[0,  a,7|Ho,  , Hl)  = I] 


/=1 


FP  ( ( pj;)  )<5(0(  F^  )1~<5(0  ) 


■0(0 


t(  0 


( Fr ) 


(0  \1  -t(0 


(4.7) 


Under  the  noninformative  censoring  assumption,  the  marginal  likelihood  for  cen- 
soring within  T/  is  given  by 

n fP-  (4.8) 

0 <l'<l 
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Denote  a generalized  geometric  variable  Y by  Y ~ Geo (L\pi,  • • • ,pl)  if  it  has  pdf 
of  the  form 

P(Y  = i)  = ( IJ(1  -pi')\  pi+i  for  0 < l < L, 

\l'= o / 

= II(1  ~Pi')  for  1 = L > (4-9) 

i'=i 

m 

where  p0  = 0 and  0 < pi  < 1 for  1 < i < L.  Clearly,  £ P(F  — l)  = 1.  Moreover, 

z=i 

(4.9)  presents  a variant  of  a regular  geometric  distribution  function,  which  is  subject 
to  truncation  and  different  “success”  probabilities.  By  this  definition,  one  obtains 

E V>=0]  = E l|,..fc.i]  = E ~ Geo(I;  F,m,  • • ■ , F,(1)).  (4.10) 

Z=1  1=1  1=1 

(4.2)-(4.4)  are  thus  referred  to  as  the  generalized  geometric  competing  risks  model 
(GEM).  The  truncated  geometric  model  proposed  by  Mori,  Woolsen,  and  Woolworth 
(1995)  is  actually  a special  case  of  this  model,  where  the  transition  probability  func- 
tion is  the  logistic  distribution. 

This  generalized  geometric  model  with  appropriate  specification  of  F’s  allows  for 
characterizing  a variety  of  bivariate  lifetime  structures.  For  instance,  if  T follows  the 

piecewise  exponential  distribution  (Breslow,  1974),  then  Ffl\pi)  = 1 — 

This  family  of  distributions  broadly  encompasses  many  lifetime  distributions  via  its 
smooth  mathematical  approximation.  Suppose  that  /3X  = • • • = (3L,  a ! = •••  = aL, 
and  71  = • • • = 7l.  Particularly,  let  X\,  ■ • • , xL  be  chosen  evenly.  It  leads  to  a simpler 
version  of  geometric  models  with  “equal  transition  probabilities”,  i.e. , 

Ft  = F?\  F^  = F^\  Fg  = Fgl\  (4.11) 

for  1 < l < L.  The  likelihood  function  under  (4.11)  is  then  given  by 

L(/3,cx,  7|H0,---,Hl) 
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L ( l 'l  t(,) 

F?(  Ft  n|[(F,  fl)  ( Fs  )1-5(,,]V’(,)  [( F^  )^l\F^  )1^(,)]  1s  I ,(4.12) 


where  w is  the  realization  of  W = ]T  1[T(()=0]  with  W ~ Geo(L,  Ft,  • • • , Ft ).  And 


/=i 


m = i|r(1)  = i)  = £{(^)'-1r,F„}, 

1=1 

P(A  = 1|X^  = 1)  = Y/{(F,y-'F,FtFs}. 


(4.13) 

(4.14) 


Further,  suppose  that  Ft(y ) = 1 — e y and  F^(y)  = Fg(y)  = ey/(l  + ey).  Then 
a family  of  singular  BVE’s  is  obtained,  where  the  continuous  waiting  time  T follows 

Cox’s  proportional  hazard  model,  i.e. , T G(l,ez'^)  and 


..z'a 


1[®(0=i|t(0=i]  ~ Bin  ^1,  l + gz,a  j , 

( eZ'7  ^ 

l[A(o=i|:r(o=«-(n=i]  ~ Bin  II,  gZ,7  I ■ 

Consequently,  under  P(ls  = 1 ) = 1,  (4.7)  matches  the  generalized  ACBVE 
likelihood  function  derived  in  Chapter  3 in  the  absence  of  the  indicator  multiplicator. 

Next,  we  derive  noninformative  priors  and  the  corresponding  posteriors  based  on 
(4.7)  and  (4.12). 

4.3  Bayesian  Analysis 

Laplace’s  prior  and  Jeffreys’  prior  are  employed  in  the  Bayesian  analysis. 


4.3.1  Likelihood  Function 


Let  T-l\  and  denote  the  indicators  of  failure,  nonsingularity,  and  cause  1 
failure  in  p for  individual  i,  1 < i < n+.  Suppose  that  5^)  is  the  realization 
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of  (T-l\  T \l\  A-^),  respectively.  Let  failures  are  indexed  by  1,  • • • , n,  and  individuals 
who  have  earlier  dropout  before  xl  are  indexed  by  n + 1,  Define  R0  = 

{1,  •••,«+},  and  for  1 < l < L,  Ri  = {i  : =0,1  < i < n+ },  Ri  = {i  : i E 

Ri-i  and  — 1,1  < i < n},  and  1 = {i  : i E Ri  and  = 1}.  Denote 
F$  = Ft( z'A),  Kll  = and  F$  = F,(. z'7i).  Under  (4.2)-(4.4),  and  (4.8), 

the  joint  likelihood  function  for  {(U,  8i,  ipt)  : 1 < i < n+]  is  given  by 


L((3,  a,  7|d)  = Lt((3\d)LMd)Lsh\d), 


where 

Lt(f3\d)  oc 
Lj,(at |d)  oc 
Ls(j\d)  oc 


(4.15) 

(4.16) 

(4.17) 

(4.18) 


The  posteriors  associated  with  Jeffreys’  prior  and  Laplace’s  prior  are  calculated 
as  follows. 


4.3.2  Laplace’s  Prior  and  the  Posterior 


Suppose  that  Laplace  uniform  prior  is  assumed.  The  posterior  density  is  propor- 


ivu = vrc/(/3,  a,  7)  oc  1, 


tional  to  (4.15),  i.e. , 


7rt/(/3,a,7|d)  oc  L(/3,  a,  7|d). 


(4.19) 

(4.20) 
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Apparently,  the  posterior  marginals  are:  nu((3\d)  oc  Lt(/3\d),  7T[/(a|d)  oc  L^{cx |d), 
and  7T[/(7|d)  oc  Ls(y\d).  Let  A C {1,-  • •,«+},  and  Z'A  denote  the  sub-matrix  of  Z' 
consisting  of  column  vectors  of  Z'  with  indices  belonging  to  A.  Using  Shao  and  Chen’s 
(1998)  results,  the  propriety  of  iru(/3,ct,y\d)  holds  if  all  the  following  requirements 
are  met. 

(C.4.1)  Z - (i)  is  of  rank  k,  for  1 < / < L since  i D Ri  A \ 

rti 

(C.4.2)  Let  Utfp  U^,  and  U^n  be  the  diagonal  matrices  with  the  ith  diagonal 

elements  given  by  (— l)^  for  i E Ri,  (— 1)A-  for  i E Ri,  and  (— I)5;  for  i E 
R^,  respectively.  Then  it  requires  that  [ZRl]'\J  Rlvl^  = 0,  [ZjjJ'U^v^  = 0,  and 
[Z^(o]'U^(i)V^  = 0,  for  some  positive  vectors  v[l\  and  for  1 < / < L. 

OO  OO  OO 

(C.4.3)  / |u|fc  dFt{u ) < oo,  / |«|fc  dF^{u)  < oo,  and  f \u\k  dFg(u ) < oo. 


— OO 


— OO 


— oo 


Under  (4.11)  and  Laplace’s  prior,  the  joint  posterior  is  given  by 


7T(,(/3,  a,  7|d)  oc  7rlI(/3|d)7ru(a|d)ir[,(7|d) 


(4.21) 


where 


(4.22) 


l=1iehi 


(4.23) 


(4.24) 


Let  = {i  : i E [j  R^}  and  R = {i  : i E U Ri}.  Then  besides  (C.4.3),  the 


propriety  of  (4.21)  is  ensured  under  the  following  assumptions. 

(C.4.4)  Z^d)  satisfies  the  full  column  rank  assumption  since  R D R^ . 
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(C.4.5)  Let  Un+,  U^,  U^(i)  be  the  diagonal  matrices  with  the  ith  diagonal  elements 
given  by  ( — l)1ii>"+1]  for  1 < i < n+,  ( — 1)^  for  i G R and  (— I)5*  for  i G R^, 
respectively.  The  propriety  requires  that  [Z]'Un+vt  = 0,  = 0,  and 

[Z^pI'U^dv^  = 0,  for  some  positive  vectors  vt,  and  v^. 


4.3.3  Jeffreys’  Prior  and  the  Posterior 


Let  f®(y)  — dF^\y)/dy.  Dehne  three  diagonal  matrices  (n+  x n+), 

(n  x n),  and  (n  x n),  and  their  ith  diagonal  elements  are  accordingly  given  by 


f 2 

Jt,i 


(/') 

(0  B(l)  I t 11  ^ t,i  ( > 


EAd  rpV. 

rt,i  rt,- 


rib 

j'<i 


(4.25) 


(b  — 


M^(/3,a)  = 


/,2 


ip,i 


^(0  ^(0 
'ip, l If), l , 


n bP ) b?  i . 

\v<i 


(4.26) 


A4?  = Af$(/J,a,  7) 


fl 


Fxl)  Fil) 
0,1  0,1 


n bP ) bfb',’ 

\i'<i 


(4.27) 


Let  M*  = ©[Z'M^Z,  • • • , Z'MfL)Z],  M*  = ®[Z;M^Zn,  • • • , Z;M^Zn],  and 
M,5  = ®[Z(jM^Z„,  • • • , ZJ,M^Z„].  The  Fisher  information  matrix  is  given  by 

1(0,  a,  7)  = ®[Mt,M*,M,]  (or  ®[Mt,M,]),  (4.28) 

which  depends  on  the  assumption  of  singularity.  Applying  the  same  technique  used 
in  Chapter  3,  Jeffreys’  prior  is  calculated  as 

7Tj(0,a,7)  a n|(£c(zu,-^zO«) 

1=1  ( \p(k)  U= 1 / 


f £ c(zu,  • • 

\pik) 


) n Mil 


u=  1 


X 


87 


/ 

\ Is 

fc  \ 

E c(zu  > * 

-.OIK. 

\pik) 

“=i  / J 

(4.29) 


where  Pw  = {(*1,  •■•,**)  : c(zn,  • • • , zik)  >0,  1 < H < *2  < • • • < **  < «+}  and 
Pn]  ~ {(*1j  * * -Pk)  ■ c(  zin---,zik)  >0,  1 < ii  < i2  < ■ ■ ■ < ik  < n}.  We  shall  prove 
in  the  next  theorem  that  the  Jeffreys’  prior  given  in  (4.29)  is  indeed  a proper  pdf 
provided  that  Zn  is  of  rank  k. 

Theorem  4.1  Assume  that  Zn  is  a full  rank  design  matrix.  Let  7Tj(/3,  a,  7)  be  given 
in  (4.29),  then 


/-/  a,  7)  d(3  da  d~f  < 00. 


Proof  Since  0 < F < 1,  one  obtains  the  inequalities 


(0\2 


(W) 


M()  < 

■.*  ■,* 


(4.30) 


Applying  Jensen’s  inequality,  one  has 


tt7(/3,q;,7) 


< Ili  E[c(zu,- 

Z=1  lp(fc) 


/ k Al)  \ 


L 

x n 


E [c(zn , ' " " > zifc)]^  n 


4 


(0 

iy, 


,W 


.=1  (bids,)1 


x n< 

z=i 


EKzq,---,^)]^  J] 


a(*> 


Ai) 

J ip, i-u 

i <di  d',L)5 
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Using  the  change-of-variables  = z 'iu(3h  for  (ix,  • • • ,ik)  e P^k\  the  propriety  of 

7Tj(/3,  a,  7)  follows  from  the  finiteness  of  the  beta  integral 


/ 


f(y) 


(f(v)fm) 


t iy  = (r(.5))  = n. 


(4.31) 


Next,  we  consider  the  reduced  model  given  in  (4.11).  Jeffreys’  prior  is  given  by 
7T/(/3,a,7) 

lp(»)  tl=l  \VMu  \Z=1  > 


X 


k r r2  / l 

J S,iu 


p(fc)  u=l  *«>*»/  \Z=1 


x 


E[c(zu>--->zij]  n 


d(^) 


U=1 


/2 


1p,iu 


(-U/),iu  Pip, in)  \l= 1 


iji 


1S  1 2 


. (4.32) 


Since  E(FM)*  1 < L and  £(FM)*  lFt}i  < 1,  /7Tj(/9,a,7)  dpdad'y  is  proper 
z=i  z=i 

according  to  Theorem  4.1.  The  corresponding  posterior  is  thus  proper  if  Zn  is  of  rank 

k. 


4.4  Application  to  a Sample  with  Categorical  Covariates 


As  cited  in  Section  3.5,  when  a sample  is  finitely  classified  into  k categories,  the 
parameters  associated  with  category  j are  independent  of  those  associated  with  any 
other  category.  Thus,  x/’s  may  be  independently  assigned  for  each  category.  Let 
Lj  denote  the  number  of  time  grids  for  category  j,  i.e.,  the  cut-off  time  points  are 

0 = xjo  < xh  < ■■■  < xjL.  < xjLj+i  = 00.  Let  /3'  = (/?j1}, • • • ,/?fj)),  <*'■  = 
(a^,  • • • , ajL^),  and  7'  = (7]^,  • • • , 7^)  be  the  parameters  associated  with  category 
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j,  1 < j < k.  Denote  f3'  = (#,•••,  (3'k),  a'  = (a'lf  • • • , a'k),  and  V = (Vi,  • • ■ , 7*). 
Define  n®  = E 4?  = E 4°(1  - 45,  = E (1  - 

jeftz(1)  ie£z(1)  jefi; 

)1[^,J=i]>  n ? = ni-  + «2*-  + n$  , and  n(/)  = £ (1  - 4°)l[^j=i]-  Denote  = 

3 3 ieRi  3 

Ft((3j l'>),  F®  = Fxp(ct-'>),  and  F®  = F^{^).  The  likelihood  function  becomes 


L(/3,a,7|d)  = 


j= i D=i L 


(i) 


x njrik^yr^t^yr®' 

j=i  D=i  L 


(4.33) 


Accordingly,  Jeffreys’  prior  is  given  by 


7rj(/3,a,7)  oc  nUl  W 
i= i 1 1=1 


( n [( rj?  J1'-'-*  ( A0  )-i  f$l  fill 


7= 1 1 1=1  L 


1 Is 


(4.34) 


The  posteriors  under  Laplace’s  prior  and  Jeffreys’  prior  are  respectively  given  by 
77/(0,  a,  7|d)  oc  L(/3,  a,  7(d)  and 


7Tj(0,a,7|d) 


k ( 


<*  n n 

7=1  1 1=1 


W)1 


(0  \n{lhLj-l-\f  F(l) 


( I ) 


11  "so-'  4° 


x n(nf(d;))”<‘?_hA'))“?'M? 

7=1  1 1=1  L 


k f Lj 


x niff 

7 = 1 1 1=1  L 


n ls 


(4.35) 
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The  criteria  for  the  propriety  of  iru(/3,ai,  7|d)  are  > 0 and  > 0 under 

the  nonsingular  model  assumption,  > 0 is  required  additionally  for  the  singular 
model. 

4.5  Data  Analysis 

The  ECOG  lung  cancer  example  is  revisited  in  this  Section.  We  consider  the  sam- 
ple categorization  defined  in  Section  3.7.  The  time  grids  for  each  category  are  chosen 
separately.  A natural  way  to  choose  the  time  grids  is  based  on  the  Kaplan-Meier 
scheme,  distinct  failure  times  being  chosen  as  time  interval  ends.  In  this  example, 
the  joint  posterior  is  improper  when  Laplace’s  prior  is  assumed  since  some  of  ra^’s 
or  n^’s  are  zeros.  We  thus  consider  only  the  posteriors  associated  with  Jeffreys’ 
prior.  No  particular  forms  of  F^’s  are  specified.  We  focus  on  the  inference  of  F^’s. 
Indeed,  the  posterior  distributions  of  /3^'s  and  7^’s  can  be  obtained  via  one-to-one 

transformation  from  F/^’ s and  F^’s,  respectively.  The  likelihood  of  ACBVE’s  can 
be  viewed  as  that  of  a GEM  with  one  grid  (cf.  Section  4.2).  We  compare  the  results 
derived  under  the  two  model  assumptions.  Table  4.2  presents  posterior  means  and 
standard  deviations  of  cause  1 failure  probabilities  under  the  ACBVE  and  the  GEM. 

Table  4.2.  Comparison  of  Posterior  Means  and  Standard  Deviations  for  P( A = 
1|T(L)  = 1)  under  the  ACBVE  and  the  GEM 


(Xi.Xj) 

Model 

B.E. 

S.D. 

(1,1) 

ACBVE 

GEM 

.6220 

.5514 

.6264 

.1510 

(1,0) 

ACBVE 

GEM 

.7308 

.5770 

.5481 

.1938 

(0,1) 

ACBVE 

GEM 

.6639 

.5853 

.6667 

.1061 

(0.0) 

ACBVE 

GEM 

.5938 

.5298 

.6056 

.2132 
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„(') 


Figure  4.1.  Bayesian  Goodness-of-Fit  Procedure  for  n[^  ’s:  plotting  posterior  predic- 
tive matching  probabilities  of  ’s. 

The  Bayesian  goodness-of-fit  procedure  for  n^’s  is  presented  by  the  scatter  plot 

of  p (o  given  in  Figure  4.1,  where  p (i>  is  computed  from  (3.51). 

b b 

It  suggests  that  the  geometric  model  is  preferable  in  terms  of  prediction  (cf.  Figure 
3.4).  From  Table  4.3,  the  posterior  standard  deviations  of  P( A = 1|T^L^  = l)’s 
are  substantially  smaller  under  the  GEM  in  comparison  with  the  ACBVE.  It  thus, 
evidences  that  the  stochastic  modeling  enhances  the  precision  of  estiate. 

The  estimated  survival  curves  under  the  two  models  are  drawn  along  with  the 


i 


- nr\ 

data  empirical  quantiles  of  f]  -F*  in  Figure  4.2. 

/'= i 
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(c)’ 


(d)T 


Figure  4.2.  Survival  Curves  under  the  ACBVE  and  the  GEM  in  Comparison  with 
the  Data  Empirical  Quantiles:  (a)  estimated  survival  curves  for  (Xi,X2)  = (1,1); 
(b)  estimated  survival  curves  for  (Xi,X2)  = (1,0);  (c)  estimated  survival  curves  for 
(Xi,X2)  = (0, 1);  (d)  estimated  survival  curves  for  (Xi,X2)  = (0,0). 
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It  reveals  consistence  between  the  estimated  survival  probabilities  under  the  GEM 
assumption  and  the  observed  data,  which  again  supports  the  advantage  of  incorpo- 
rating Bayesian  method  with  this  modeling  strategy. 


CHAPTER  5 

SUMMARY  AND  FUTURE  RESEARCH 


Competing  risks  models  often  lead  to  nonidentifiable  likelihoods.  The  resulting 
inferential  problem  can  be  resolved  by  employing  Bayesian  methodology  with  infor- 
mative priors  (cf.  Section  2.7)  or  stagewise  noninformative  priors  (cf.  Chapters  2 
and  3),  where  the  underlying  models  are  assumed  to  be  BVE’s,  or  utilizing  a more 
general  class  of  stochastic  models  for  the  competing  risks  outcomes  (cf.  Chapter  4). 
In  the  simulation  study  demonstrated  in  Section  2.7,  we  find  that  with  the  use  of 
informative  priors,  Bayesians  successfully  guide  the  inference  of  even  nonidentifiable 
parameters.  The  present  research,  however,  focuses  primarily  on  Bayesian  method 
with  noninformative  priors.  The  proposed  stagewise  priors  are  appealing  due  to  vari- 
ous probability  matching  properties.  In  Chapter  4,  a class  of  flexible  competing  risks 
models  is  proposed,  which  unlike  BVE’s,  does  not  require  a specific  model  assump- 
tion. Also,  even  without  any  model  specification,  when  covariates  are  categorical, 
Laplace’s  and  Jeffreys’  priors  lead  to  beta  posteriors  for  the  transition  probabilities. 
Moreover,  for  analysis  of  a particular  dataset,  precision  of  estimates  is  enhanced  by 
this  flexible  modeling. 

The  mirror  side  of  competing  risks  is  referred  to  as  “complementary  risks” , where 
the  data  consist  of  Tc  = max(T1,T2),  and  Ic  — jl[T3_j<T,]>  for  instance,  a parallel 
system  of  two  components.  The  likelihood  of  (Tc,  Ic ) once  again  is  not  fully  identifi- 
able. As  in  our  case,  this  nonidentifiability  can  be  viewed  as  a consequence  of  data 
incompleteness.  Incorporating  Bayesian  methodology  into  such  cases  can  result  in 
fruitfully  inferential  procedures. 


94 


APPENDIX 

PROOFS  OF  MATCHING  PROPERTIES  OF  nJV  AND  ttj 


First  order  probability  matching 


since  and  n(9)  are  free  from  9. 

Second  order  probability  matching 

We  first  evaluate  = p1Irl/In  and  Ljru  = EQ{dzl{9)/d9jd9rd9u },  1 <r,u< 

k 

k.  Clearly,  and  Ljru  = ^.^0,^0,^  are  all  independent  of  6.  Thus 


i- 1 


HPD  matching 


Since 


E0{dl(9)/d9Jd2l{9)d9rd9u} 


0 


and  dirju(6)/d9r  — dirj(9)/d9r  = 0,  the  HPD  property  is  satisfied. 
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